<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/1200px-Pandas_logo.svg.png" height="70%" width="70%" >
<br><br>

## *PYTHON DATA ANALYSIS LIBRARY*

### Importing pandas modules :

In [0]:
pip install pandas

In [0]:
import pandas as pd

### Pandas Series ?
[pandas.Series()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas-series)
: One-dimensional ndarray with axis labels (including time series)

In [0]:
phone = ["iphone 10","Note 8","Mate 30 pro","iphone 4s","Galaxy S20 Ultra"]
pd.Series(phone)

### Creating a DataFrame:
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. 
[ [pandas-doc-link](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe) ]

##### 1. Creating a DataFrame from Dictionary.


In [0]:
# Dictionary
#     my_dict = {  
#         key:value,
#         key:value,
#         key:value,
#         ...
#     }
# Create our dictionary
phone_data = {
    "SmartPhone":("iphone 10","Note 8","Mate 30 pro","iphone 4s","Galaxy S20 Ultra"),
    "Company":("Apple","Samsung","Huawei","Apple","Samsung"),
    "Rating":("4.5","5.0","3.7","4.0","4.9")
}

In [0]:
# DATAFRAME
df = pd.DataFrame(phone_data)
df

##### 2. Creating a DataFrame from Lists.

In [0]:
# LIST
smartPhone = ["iphone 10","Note 8","Mate 30 pro","iphone 4s","Galaxy S20 Ultra"]
company = ["Apple","Samsung","Huawei","Apple","Samsung"]
Rating = ["4.5","5.0","3.7","4.0","4.9"]

In [0]:
# zip() function ?
for i in zip(smartPhone,company,Rating):
    print(i)

In [0]:
# DATAFRAME
df = pd.DataFrame(zip(smartPhone,company,Rating))
df

What about the Column Names?

In [0]:
# Adding column name in our DataFrame
df = pd.DataFrame(zip(smartPhone,company,Rating), columns=["SmartPhone","Company","Rating"])
df

### Renaming Columns

In [0]:
# Get all the current column names in a list 
column_names = list(df.columns)
column_names

In [0]:
# Renaming all columns
df.columns = ["Smart_Phone","sp_company","RATING"]
df

**Rename a specific Colummn:**

In [0]:
# rename specific Column
df.rename( columns = {'sp_company':'Company','RATING':'Rating'} )

## Operations on Columns :
We can perform different operations on a column which are **selecting**, **deleting** and **adding** new or existing columns.
### For selecting a column :
Syntax = dataframe_name[ "Column Name " ]

In [0]:
# Print existing DatFrame
df

In [0]:
# Select " rating " column
df["RATING"]

### Adding a new Row(s) in DataFrame :
We use [append() function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html#pandas-dataframe-append) to append new row(s) to our existing DataFrame.

In [0]:
df

In [0]:
# New Data to add to DataFrame
new_data=[
    {
    "Smart_Phone":"ipad",
    "sp_company":"Apple",
    "RATING":"3.6"
    },
    {
    "Smart_Phone":"ipad 2",
    "sp_company":"Apple",
    "RATING":"4.1"
    }
]

In [0]:
df.append(new_data)

In [0]:
# parameters?
df = df.append(new_data,ignore_index=True)

In [0]:
df

### Adding a new Columns in DataFrame :

Using the [assign() function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html#pandas-dataframe-assign) to add a new column to existing DatFrame().

In [0]:
ram=[2,4,3,1,6,2,4]
df.assign(Smart_RAM=ram)

Adding multiple columns :


In [0]:
ram=[2,4,3,1,6,2,5]
storage=[32,128,64,16,64,15,12]
battery=[2000,3000,6000,2200,4000,200,1200]

df = df.assign(**{'RAM':ram,'Storage':storage,'Battery':battery})

In [0]:
df

You could this too for adding a new column. (*The inplace method*)

In [0]:
num_camera=[2,2,3,1,4,2,4]
df['num_camera']= num_camera
df

### Reading a csv file: 
To read csv files using pandas, we use the function [pd.read_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas-read-csv) wherein we pass the path to csv file to be imported.

In [0]:
raw_df = pd.read_csv('titanic.csv')
raw_df

### Analyse the first 5 rows of our Dataset:
To have a quick view on the first 5 rows of our dataset, we use the [head() function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html#pandas-dataframe-head) .


In [0]:
raw_df.head()

In [0]:
# View first ' n ' columns
raw_df.head(15)

### Analyse the **last 5 rows** of our Dataset:

To have a quick view on the last 5 rows of our dataset, we use the [tail() function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html#pandas-dataframe-tail) .


In [0]:
raw_df.tail()

In [0]:
# View last ' n ' columns
raw_df.tail(10)

### Get a summary of the distribution of data in our Dataset :

The [describe()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html#pandas-dataframe-describe) fnuction is used for calculating the statistical descriptions like **percentile, mean and standard deviation of the numerical Column values** of the Series or DataFrame.

In [0]:
raw_df.describe()

### Dataset Summary :
In order to get a small summary of our Dataset, which includes the column's datatype, number of not null entries, total number of columns, Total number of rows and memory usage. The [info() function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html#pandas-dataframe-info) is used.

In [0]:
raw_df.info()

### Getting unique entries in any column :
The [value_counts()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html#pandas-series-value-counts) function returns a Series object containing the counts of unique values present in the column name passed. 
<br><br>

*Syntax* = dataframe_name[ "Column Name " ].value_counts()

In [0]:
# Unique Values in Age Column
raw_df['Sex'].value_counts()

### Get total number of null values in every column :
The [isna() function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html#pandas-dataframe-isna) is used to calculate if the value in that column-row is a null value, return True if value is null else False if not null.

In [0]:
raw_df

In [0]:
print("Number of NaN values in each column:")
# na values ?
raw_df.isna().sum()

### Remove null values :
To drop or remove null values from a specific row or column we use the [dropna() function ](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas-dataframe-dropna) .


In [0]:
# Total Length (Rows) of our dataFrame
print("Total number of rows :", len(raw_df))

In [0]:
# Drop all null values in DF
raw_df.dropna()

**!** dropna() Parameters?

In [92]:
print("Number of rows in DataFrame after removing all and 'any' null values present in rows:   ")

len(raw_df.dropna(how='all'))


Number of rows in DataFrame after removing all and 'any' null values present in rows:   


891

### Replace null values :
We use the [ fillna() function ](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas-dataframe-fillna) to replace null values.
<br><br>
In our case lets **fill the null values in Age** column to the **average** value of age.

In [0]:
# Check the names of existing column names in out DataFrame :
raw_df.columns

In [0]:
# Assigning all null values in the AGE column to the average value of age of passengers.
raw_df["Age"] = raw_df["Age"].fillna(raw_df['Age'].mean())

Check the Number of null values across all column :

In [0]:
# Check the null values present across all columns
raw_df.isna().sum()

### Slicing in Pandas:

For selecting specific rows or a range of rows from our dataset, we use slicing. This can be done using either label or integer-based indexing.
1. Integer based indexing :

In [0]:
raw_df.head(12)

Selecting a specific row index  :

In [0]:
# For viewing the row at index value = 5
raw_df.iloc[5]

In [0]:
# Selecting the rows from 10 to 20
raw_df.iloc[10:21]

2. label based:
<br>
loc function is primarily used for label based searching. Integers can be used for searching but they would be interpreted as labels.
<br><br>
In our case lets try to filter out all the passengers having Age greator than 20 .

In [0]:
# Running the below commands returns a DataFrame with passengers having " Age " greator than 20 .
raw_df.loc[ raw_df['Age'] > 20 ]

How about multi-conditional searching ? 

In [0]:
# Running the below commands returns a DataFrame with passengers having " Age " greator than 20  And having " male " gender.
raw_df.loc[ (raw_df['Age']>20) & (raw_df['Sex'] == 'male' ) ]

Can we select two different columns using loc ? 

In [0]:
columns_to_select = ['Name', 'Age']
raw_df.loc[ :, columns_to_select ]

<strong>! Also ...</strong>
How can we get all the rows which have more than one null values present in our DataFrame :
<br><br>
This can be done using the followig two commands :
<br>

1. [pandas.isnull()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isnull.html#pandas-isnull)
2. [DataFrame.any()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.any.html#pandas-dataframe-any)


In [0]:
# DataFrame with more than one null values
raw_df[pd.isnull(raw_df).any(axis=1)]

<br><br>
You have reached the end of this Python Notebook!
<hr>