---
# Course 2 - Week03 - Day01 - Practice Exercise
---

# Agenda
- Pandas
  - Series
    - Creation
    - Indexing
    - Functions
  - Dataframe
    - Creation
    - Indexing
    - Functions

# Pandas in Python
- Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures
- Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze
- Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc

### Key Features of Pandas
- Fast and efficient DataFrame object with default and customized indexing
- Tools for loading data into in-memory data objects from different file formats
- Data alignment and integrated handling of missing data
- Reshaping and pivoting of date sets
- Label-based slicing, indexing and subsetting of large data sets
- Columns from a data structure can be deleted or inserted
- Group by data for aggregation and transformations
- High performance merging and joining of data
- Time Series functionality

### Pandas deals with the following data structures :
- Series
- DataFrame

These data structures are built on top of Numpy array, which means they are fast

### Mutability
- All Pandas data structures(series, dataframe) are value mutable (can be changed) and except Series all are size mutable. Series is size immutable

## 1. Pandas Series
- Series is a one-dimensional array like structure with homogeneous data
- Size Immutable
- Values of Data Mutable

### Series creation

#### Create an empty series

In [None]:
import

#### Creating series from an array ([20, 30])

#### Creating series from  list

In [26]:
list = ['p', 'y', 't', 'h', 'o', 'n']

#### Creating series from dictionary

In [4]:
dictionary = {0: 21, 1: 400, 2: 39, 3: 10}

In [29]:
dictionary1 = {'w': 21, 'x': 400, 'y': 39, 'z': 10}

### Pandas Series methods
- describe()
- append()
- apply()
- count()
- copy() and many more....

#### Show the statistical vlaues for the series using describe() function

In [6]:
series = pd.Series([2,4,6,8,10,12])

#### Show the count for the given series using count() function

In [7]:
series = pd.Series([2,4,6,8,10,12,np.nan])

#### Append the given series using append() function

In [8]:
series1 = pd.Series([10, 20])
series2 = pd.Series([40, 60], index=[2, 3])

#### Square the given series(series1 & series2) using apply() function
- lambda is one liner function
- square the values in series1 & series2 using lambda and print the new values

#### Using copy() function to copy series1 & series2

### Binary operation methods on series
- add()
- sub()
- mul()
- div()
- sum()
- mean()

In [12]:
series = pd.Series(data= [2000.56, 1234.00, 2133.67, 7890.80, 2234.10], name = "Salary")
series

0    2000.56
1    1234.00
2    2133.67
3    7890.80
4    2234.10
Name: Salary, dtype: float64

#### Using add(), add the series [2,4,6,8,10] to the above series with name "Salary"

#### Using sub(), subtract the series [2,4,6,8,10] to the above series with name "Salary"

#### Using mul(), multiply the series [2,4,6,8,10] to the above series with name "Salary"

#### Using div(), divide the series [2,4,6,8,10] to the above series with name "Salary"

#### Find the sum of the series

#### Find the mean of the series

## 2. Pandas DataFrame
- DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data
- Data and Size both are mutable
- Labeled axes (rows and columns)
- Can Perform Arithmetic operations on rows and columns
- Potentially columns are of different types

### Dataframe creation

#### Create a dataframe(df) from the given dictionary

In [14]:
import pandas as pd
import numpy as np
dictionary = {'col1': [10.0, 20.0], 'col2': [30.0, np.nan]}
dictionary

{'col1': [10.0, 20.0], 'col2': [30.0, nan]}

#### Creating dataframe(df) from a nd array [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

#### Creating dataframe(df) from a list of dictionaries [{"col1": 10, "col2": 20}, {"col1": 50, "col2": 100, "col3": 200}]

#### Creating dataframe(df) from a dictionary of lists, list1 = [10.0, 20.0, 30.0, 40.0], list2 = [40.0, 30.0, 20.0, 10.0]

### Pandas Indexing
- Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame
- Indexing could mean selecting all the rows and columns or some of the rows or columns
- Can be done using : 
  - loc[ ]
  - iloc[ ]

In [23]:
df4 = pd.DataFrame([[100, 2500], [490, 789], [500, 1500]],
     index=['pune', 'mumbai', 'nagpur'],
     columns=['max_speed(kmph)', 'distance(km)'])
df4

Unnamed: 0,max_speed(kmph),distance(km)
pune,100,2500
mumbai,490,789
nagpur,500,1500


### Perform indexing based on the given index from the above datafreme.

#### Use .loc function

#### .iloc based indexing

### Iterating over rows and columns of dataframe
- In order to iterate over rows, we can use three functions :
  - iterrows()
  - itertuples()
  - iteritems()

In [29]:
import pandas as pd
dict = {'name':["John", "Puneet", "Sudhir", "Geeta"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
df = pd.DataFrame(dict)
df

Unnamed: 0,name,degree,score
0,John,MBA,90
1,Puneet,BCA,40
2,Sudhir,M.Tech,80
3,Geeta,MBA,98


#### iterrows()
- As the name suggested, it iterates over DataFrame rows as (index, Series) pairs

#### Use iterrows() function on the above dataframe and display the output

#### itertuples()
- Iterate over DataFrame rows as namedtuples

#### Use itertuples() function on the above dataframe and display the output

#### iteritems()
- Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series

#### Use iteritems() function on the above dataframe and display the output

### Dataframe methods
- Similar to Series, Dataframe also has methods as :
  - describe()
  - count()
  - append()
  - apply()
  - columns()
  - dtypes()
  - astypes()
  - copy() and many more...

In [15]:
import pandas as pd
dict = {'name':["John", "Puneet", "Sudhir", "Geeta"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
df = pd.DataFrame(dict)
df

Unnamed: 0,name,degree,score
0,John,MBA,90
1,Puneet,BCA,40
2,Sudhir,M.Tech,80
3,Geeta,MBA,98


#### Use describe() function for the above dataframe

#### Show the counts for the above dataframe using count() function

#### Use the append() function for the below dataframes

In [17]:
df2 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'), index=['x', 'y'])
df2

Unnamed: 0,A,B
x,1,2
y,3,4


In [18]:
df3 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'), index=['x', 'y'])
df3

Unnamed: 0,A,B
x,5,6
y,7,8


#### Use lambda and squzare the dataframes df2 & df3 with apply() function

Unnamed: 0,A,B
x,1,4
y,9,16


#### Display the names of the columns for the dataframe df

#### Display the data types for the dataframe df

#### Convert the data type for the score column to float using astypes() function

#### Create a copy of the dataframe df2 with copy() function

### To obtain information about the dataset

#### Used head() & tail() functions on dataframe df to display the information about the dataset

#### tail function

#### Use info() function on dataframe df to display the information about the dataset

### Rank the dataframe
#### Find the rank for the dataframe df using methodss 'min' and 'dense'. Sort it in both ascending and decending order and display it

#### Check the missing/null values using .isnull() function and display their percentages too

## Happy Learning :)