# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [2]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd

## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [4]:
# example of a series from a list 
marks = [10,20,33,42,19,30]
#Series
marks_series = pd.Series(marks)
marks_series

0    10
1    20
2    33
3    42
4    19
5    30
dtype: int64

## Creating and Displaying

In [5]:
#Example 1- Creating a series from a list
data = [10.5,11.2,10.7,9.9,10.2]
# Series
list_series = pd.Series(data, name="Student Marks")
list_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Student Marks, dtype: float64

In [6]:
#Data type
type(list_series)

pandas.core.series.Series

In [8]:
#Example 2- Creating a series from NumPy Array
data_arr = np.array(data) # Created an Array from a list

type(data_arr)

numpy.ndarray

In [12]:
#Series from Array
arr_series= pd.Series(data_arr, name="Array Series")
arr_series 

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [13]:
#Example 3 - Series dictionary
data_dict= {
    "Prof":100,
    "Lumumba":250,
    "Carol":300,
    "Eve":450
}

type(data_dict)

dict

In [14]:
# series from dict
dict_series = pd.Series(data_dict, name="Sky Team")
dict_series

Prof       100
Lumumba    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [20]:
#Series with cutsom index labels
balance =[1000, 1500, 2000, 4000]
custom_labels = ['A','B','C','D'] #Custom indexes

custom_labels_series = pd.Series (data = balance, index = custom_labels, name="balances")
custom_labels_series

A    1000
B    1500
C    2000
D    4000
Name: balances, dtype: int64

## Basic Operations With Series

In [21]:
arr_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Array Series, dtype: float64

In [22]:
#Accessing elemnets in a series
print(arr_series[3])

9.9


In [23]:
dict_series

Prof       100
Lumumba    250
Carol      300
Eve        450
Name: Sky Team, dtype: int64

In [40]:
#Accessing elemnets in a series
print(dict_series['Carol'])

300


In [25]:
custom_labels_series

A    1000
B    1500
C    2000
D    4000
Name: balances, dtype: int64

In [26]:
#Accessing elemnets in a series
print(custom_labels_series['C'])

2000


In [27]:
#Accessing elemnets in a series
print(custom_labels_series['C':'D'])

C    2000
D    4000
Name: balances, dtype: int64


In [28]:
#Arithmeertic Operations
#convert balance into percentages
x = custom_label_series/100
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: balances, dtype: float64

In [30]:
#filter elements
x_filtered = x[x>=15]
x_filtered

B    15.0
C    20.0
D    40.0
Name: balances, dtype: float64

In [31]:
#Basic summary Statistics
x

A    10.0
B    15.0
C    20.0
D    40.0
Name: balances, dtype: float64

In [32]:
#Mean
mean= x.mean()
print(mean)

21.25


In [33]:
#std
std= x.std()
print(std)

13.149778198382917


In [36]:
#Max Value
max= x.max()
print(max)

40.0


In [39]:
#Total Value
sum=  x.sum()
print(sum)

85.0
