# Pandas
In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
* Created in 2008 by Wes Mskinney.
* Open source new BSD license.



# The DataFrame Data Structure

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.<br>
In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, text file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.

### Creating a DataFrame using Series.

In [0]:
# Creating a Series by using Series
import pandas as pd
# Series1
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
# Series2
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
# Series3
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
# Merging all series in a Dataframe.
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
df

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,22.5
Store 1,Kevyn,Kitty Litter,2.5
Store 2,Vinod,Bird Seed,5.0


**Inserting a new column in a Dataframe using insert()method.<br>**
**Syntax**:<br>
               df.insert(location, "column_name", value, allow_duplicates = False)

In [0]:
# inserting a new column to a DataFrame pandas
df.insert(2, "Age", [21, 23, 24], True) 
df

Unnamed: 0,Name,Item Purchased,Age,Cost
Store 1,Chris,Dog Food,21,22.5
Store 1,Kevyn,Kitty Litter,23,2.5
Store 2,Vinod,Bird Seed,24,5.0


In [0]:
print(df.items())

<generator object DataFrame.items at 0x7f1286b42d00>


In [0]:
# Alternative way to insert a new column at the last.
address = ['gwalior', 'morena','bhind']
df['Adress']= address
df

Unnamed: 0,Name,Item Purchased,Age,Cost,Adress
Store 1,Chris,Dog Food,21,22.5,gwalior
Store 1,Kevyn,Kitty Litter,23,2.5,morena
Store 2,Vinod,Bird Seed,24,5.0,bhind


In [0]:
df['Branch'] = ['IT','Cs','EC']
df

Unnamed: 0,Name,Item Purchased,Age,Cost,Adress,Branch
Store 1,Chris,Dog Food,21,22.5,gwalior,IT
Store 1,Kevyn,Kitty Litter,23,2.5,morena,Cs
Store 2,Vinod,Bird Seed,24,5.0,bhind,EC


### Querying a DataFrame.

In [0]:
#print all name whose age is greater than 21.
df['Name'][df['Age']>21]    

Store 1    Kevyn
Store 2    Vinod
Name: Name, dtype: object

In [0]:
#print the all Name, Branch whose cost is greater than or equal to 5.0
df[['Name','Branch']][df['Cost']>=5.0]  

Unnamed: 0,Name,Branch
Store 1,Chris,IT
Store 2,Vinod,EC


In [0]:
#To print all columns of dataframe.
df[df.columns] 

Unnamed: 0,Name,Item Purchased,Age,Cost,Adress,Branch
Store 1,Chris,Dog Food,21,22.5,gwalior,IT
Store 1,Kevyn,Kitty Litter,23,2.5,morena,Cs
Store 2,Vinod,Bird Seed,24,5.0,bhind,EC


In [0]:
# Print the coloumns starting at location 1 and end before 4.
df[df.columns[1:4]]

Unnamed: 0,Item Purchased,Age,Cost
Store 1,Dog Food,21,22.5
Store 1,Kitty Litter,23,2.5
Store 2,Bird Seed,24,5.0


**Note:**
* To query by numeric loacation, starting at zero, use the iloc[] attribute.
* To query by the index label you can use the loc[] attribute.
* loc[] attribute also supports slicing.


In [0]:
# print the details of store2. 
df.loc['Store 2']

Name                  Vinod
Item Purchased    Bird Seed
Age                      24
Cost                      5
Adress                bhind
Branch                   EC
Name: Store 2, dtype: object

In [0]:
df.iloc[2]

Name                  Vinod
Item Purchased    Bird Seed
Age                      24
Cost                      5
Adress                bhind
Branch                   EC
Name: Store 2, dtype: object

In [0]:
# Returns the type of store2 
type(df.loc['Store 2'])

pandas.core.series.Series

In [0]:
df.loc['Store 1']

Unnamed: 0,Name,Item Purchased,Age,Cost,Adress,Branch
Store 1,Chris,Dog Food,21,22.5,gwalior,IT
Store 1,Kevyn,Kitty Litter,23,2.5,morena,Cs


In [0]:
# Prints the cost of store1
df.loc['Store 1', 'Cost']

Store 1    22.5
Store 1     2.5
Name: Cost, dtype: float64

In [0]:
# This transpose the dataframe as it treating the dataframe as a matrix
df.T

Unnamed: 0,Store 1,Store 1.1,Store 2
Name,Chris,Kevyn,Vinod
Item Purchased,Dog Food,Kitty Litter,Bird Seed
Age,21,23,24
Cost,22.5,2.5,5
Adress,gwalior,morena,bhind
Branch,IT,Cs,EC


In [0]:
df.T.loc['Cost']

Store 1    22.5
Store 1     2.5
Store 2       5
Name: Cost, dtype: object

In [0]:
df['Cost']   # to access all items cost

Store 1    22.5
Store 1     2.5
Store 2     5.0
Name: Cost, dtype: float64

In [0]:
df.loc['Store 1']['Cost']     # to access only store 1 cost

Store 1    22.5
Store 1     2.5
Name: Cost, dtype: float64

In [0]:
# prints the name and cost of all the indexes.
df.loc[:,['Name', 'Cost']]     #.loc attribute also supports slicing

Unnamed: 0,Name,Cost
Store 1,Chris,22.5
Store 1,Kevyn,2.5
Store 2,Vinod,5.0


**.drop()-** .drop() method can delete the entire row or a cloumn.<br>
* To drop a row you can use the index of a row.<br>
* To drop a columns you can use the column name and axis = 1.
 For example<br>
 Drop a row - df.drop([0,1], axis = 0)<br>
 Drop a column- df.drop(columns = ['Name' , 'Age'], axis  = 1)

In [0]:
df.drop('Store 1', axis = 0)   #drop function doesn't change the DataFrame by default. And instead, returns to you a copy of the DataFrame with the given rows removed.


Unnamed: 0,Name,Item Purchased,Age,Cost,Adress,Branch
Store 2,Vinod,Bird Seed,24,5.0,bhind,EC


In [0]:
df   #drop function doesn't change the original dataframe.

Unnamed: 0,Name,Item Purchased,Age,Cost,Adress,Branch
Store 1,Chris,Dog Food,21,22.5,gwalior,IT
Store 1,Kevyn,Kitty Litter,23,2.5,morena,Cs
Store 2,Vinod,Bird Seed,24,5.0,bhind,EC


In [0]:
copy_df = df.copy()
copy_df = copy_df.drop('Store 1')
copy_df

Unnamed: 0,Name,Item Purchased,Age,Cost,Adress,Branch
Store 2,Vinod,Bird Seed,24,5.0,bhind,EC


In [0]:
del copy_df['Name']

In [0]:
copy_df      # If we copy the dataframe and drop the data from copy_df the then copy_df will change

Unnamed: 0,Cost,Item Purchased,Age,Adress
Store 2,5.0,Bird Seed,24,bhind


In [0]:
copy_df.drop?

In [0]:
 #There is a second way to drop a column, however. And that's directly through the use of the indexing operator, using the del keyword.
del copy_df['Adress'] 
copy_df  

Unnamed: 0,Item Purchased,Age,Cost,Branch
Store 2,Bird Seed,24,5.0,EC


In [0]:
df['Location'] =None
df

Unnamed: 0,Name,branch,Location
1,Braj,,
2,Adarsh,,


# Dataframe Indexing and Loading

In [0]:
costs = df['Cost']
costs

Store 1    22.5
Store 1     2.5
Store 2     5.0
Name: Cost, dtype: float64

In [0]:
# Add 2 to all the costs.
costs+=2
costs

Store 1    24.5
Store 1     4.5
Store 2     7.0
Name: Cost, dtype: float64

In [0]:
df

Unnamed: 0,Cost,Item Purchased,Age,Name
Store 1,24.5,Dog Food,21,Chris
Store 1,4.5,Kitty Litter,23,Kevyn
Store 2,7.0,Bird Seed,24,Vinod


### Creating a DataFrame using python dictionary.

In [17]:
import pandas as pd
import numpy as np
df = pd.DataFrame ({'S Name' :['Braj kishore', 'Adarsh Jadon','Aarif khan','Rajan', 'Chirayu', 'Karan'], 
        'S Age': [19,19,19,19,19,19] ,
        'S Mail id': ['brajkishoreprajapati@gmail.com', 'adarshjadon5@gmail.com', 'aarifkhan7@gmail.com', 'kajal12@gmail.com', 'chinuchatur14@gmail.com', 'karanraj1@gmail.com'], 
        'S Phone number':[8963941798, 9340401400,8770493574,9131712320,8889969924,9131712939]})
df

Unnamed: 0,S Name,S Age,S Mail id,S Phone number
0,Braj kishore,19,brajkishoreprajapati@gmail.com,8963941798
1,Adarsh Jadon,19,adarshjadon5@gmail.com,9340401400
2,Aarif khan,19,aarifkhan7@gmail.com,8770493574
3,Rajan,19,kajal12@gmail.com,9131712320
4,Chirayu,19,chinuchatur14@gmail.com,8889969924
5,Karan,19,karanraj1@gmail.com,9131712939


**shape method** - Return a tuple representing the dimension of the DataFrame.

In [20]:
# returns the top 5 rows.
df.head()

Unnamed: 0,S Name,S Age,S Mail id,S Phone number
0,Braj kishore,19,brajkishoreprajapati@gmail.com,8963941798
1,Adarsh Jadon,19,adarshjadon5@gmail.com,9340401400
2,Aarif khan,19,aarifkhan7@gmail.com,8770493574
3,Rajan,19,kajal12@gmail.com,9131712320
4,Chirayu,19,chinuchatur14@gmail.com,8889969924


In [19]:
# Return the bottom 5 rows.
df.tail()

Unnamed: 0,S Name,S Age,S Mail id,S Phone number
1,Adarsh Jadon,19,adarshjadon5@gmail.com,9340401400
2,Aarif khan,19,aarifkhan7@gmail.com,8770493574
3,Rajan,19,kajal12@gmail.com,9131712320
4,Chirayu,19,chinuchatur14@gmail.com,8889969924
5,Karan,19,karanraj1@gmail.com,9131712939


In [21]:
# Retruns the dimension of a dataframe
df.shape

(6, 4)

In [22]:
# Return the length of the dataframe or the total numebr of rows.
len(df)

6

In [9]:
# .columns returns the column names of the dataframe.
df.columns

Index(['S Name', 'S Age', 'S Mail id', 'S Phone number'], dtype='object')

In [23]:
# Describes the statistical infromation of all the numeric type columns of a dataframe.
df.describe()

Unnamed: 0,S Age,S Phone number
count,6.0,6.0
mean,19.0,9038039000.0
std,0.0,203945100.0
min,19.0,8770494000.0
25%,19.0,8908463000.0
50%,19.0,9047827000.0
75%,19.0,9131713000.0
max,19.0,9340401000.0


In [24]:
# Gives the information of the columns.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
S Name            6 non-null object
S Age             6 non-null int64
S Mail id         6 non-null object
S Phone number    6 non-null int64
dtypes: int64(2), object(2)
memory usage: 320.0+ bytes


### Rename a column name.

In [27]:
df.rename(columns={'S Name': 'Name', 'S Age':'Age', 'S Mail id':'Email','S Phone number':'Mobile No.'},inplace = True)
df


Unnamed: 0,Name,Age,Email,Mobile No.
0,Braj kishore,19,brajkishoreprajapati@gmail.com,8963941798
1,Adarsh Jadon,19,adarshjadon5@gmail.com,9340401400
2,Aarif khan,19,aarifkhan7@gmail.com,8770493574
3,Rajan,19,kajal12@gmail.com,9131712320
4,Chirayu,19,chinuchatur14@gmail.com,8889969924
5,Karan,19,karanraj1@gmail.com,9131712939
