# Introduction to Pandas Series and DataFrames

## Objectives

* Understand Pandas Series and DataFrames
* Creating Series and DataFrames
* Basic Operations with Series 
* Exploring DataFrame Basics
* Selecting Data from DataFrames
* Applying Functions to Series and DataFrames

## Loading Libraries

In [1]:
# numpys - for arithmetic operations and high-level mathematical functions to operate on arrays
import numpy as np
# pandas - for working with relational or labeled data
import pandas as pd

## What is a Pandas Series?

* **One-Dimensional** labeled Array capable of holding data on any type such as *intergers*, *string*, *float*, *python objects* etc.
* A pandas series is like a column in a table.


### Key features of a Pandas Series

* **Homogeneous Data**: A Series Holds Data of a single data type(integer, float, string etc), ensuring homogeneity within the Series.
* **Labeled Index**: Each element in a Series is associated with a label called an *index*. Having unique labels is a common practice, though not strictly required. The labels just need to be hashable types, ie they need to be used as keys in a dictionary. This index allows for easy and efficient data retrieval and manipulation.
* **Vectorized Operations**: - Series support vectorized operations, ie you can apply operations to the entire series without the need for explicit loops.
* **Alignment of Data**: - When performing operations on a Series, Pandas automatically aligns data based on index labels, which simplifies data manipulation.
* **Creation**: - Can be created from a List, NumpyArrays, Dictionary, DataFrame slice and other data sources. 

In [2]:
# example of a series from a list 
marks = [10,20,33,42,19,30]

# series
marks_series = pd.Series(marks)
marks_series

0    10
1    20
2    33
3    42
4    19
5    30
dtype: int64

## Creating and Displaying

In [3]:
# Creating a series frpm a list
data = [10.5, 11.2, 10.7, 9.9, 10.2]

# series
list_series = pd.Series(data, name="Student Marks")
list_series 

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
Name: Student Marks, dtype: float64

In [4]:
# data type
type(list_series)

pandas.core.series.Series

In [6]:
# creating a series from  a numpy array
data_arr = np.array(data) #array from a list

type(data_arr)

numpy.ndarray

In [12]:
# series
array_series = pd.Series(data_arr)
array_series

0    10.5
1    11.2
2    10.7
3     9.9
4    10.2
dtype: float64

In [7]:
# Series from dictionary/ single pair values
data_dict = {
    "Prof": 100,
    "Dom": 250,
    "Caro": 300,
    "Eve": 450
}

type(data_dict)

dict

In [9]:
# series
dict_series = pd.Series(data_dict, name="Sky team")
dict_series

Prof    100
Dom     250
Caro    300
Eve     450
Name: Sky team, dtype: int64

In [None]:
# Series from dictionary/ single pair values

## Basic Operations With Series

In [13]:
# accessing elements in a series
print(array_series[3])

9.9


In [10]:
# accessing elements in a series
print(dict_series["Prof"])

100


In [14]:
# accessing elements in a series
# print(custom_lable_series[])

In [15]:
# arithmetic operations (done elementwise)
#convert into percentages
x = array_series / 100
x

0    0.105
1    0.112
2    0.107
3    0.099
4    0.102
dtype: float64

In [16]:
# filter elements
x_filtered = x[x>= 0.1]
x_filtered

0    0.105
1    0.112
2    0.107
4    0.102
dtype: float64

In [17]:
# basic summary statistics
x

0    0.105
1    0.112
2    0.107
3    0.099
4    0.102
dtype: float64

In [18]:
# mean
mean = x.mean()
print(mean)

0.10499999999999998


In [19]:
# standard deviation
std = x.std()
print(std)

0.004949747468305828


In [21]:
# max
max = np.round(x,3).max()
print(max)

0.112
