# Pandas

- Pandas is a Python library.
- Pandas is used for data analyzing, cleaning, manipulating data.
- There are two core objects in pandas: the DataFrame(2D) and the pandas.Series (1D)

<figure>
<center>
<img src='https://www.w3resource.com/w3r_images/pandas-data-frame.svg' />
</center>
</figure>

### Series 
- A Pandas Series is like a column in a table or 1 dimensional array holding data of any type.


### Dataframe
- A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

## Installation of Pandas

In [1]:
pip install pandas




## how to use pandas
- Once Pandas is installed, import it in your applications by adding the import keyword

In [2]:
import pandas

# or

import pandas as pd

## Creating series and dataframe

## creating series

In [3]:
a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(type(myvar))
myvar

<class 'pandas.core.series.Series'>


x    1
y    7
z    2
dtype: int64

## creating dataframe with list

In [4]:
mylist=[1,2,3]
m=pd.Series(mylist,index=['one','two',"three"])
m

one      1
two      2
three    3
dtype: int64

## with nested list

In [5]:
data = [['tom', 10], ['nick', 15], ['juli', 14]]
  
df = pd.DataFrame(data, columns = ['Name', 'Age'])
  
df

Unnamed: 0,Name,Age
0,tom,10
1,nick,15
2,juli,14


## using dictionary

In [6]:
d = {'id': [1, 2, 10], 
     'val1': ['a', 'b', 'c']}

d = pd.DataFrame(d)
d

Unnamed: 0,id,val1
0,1,a
1,2,b
2,10,c


## Using zip function 

In [7]:
name=["a","b","c","d"]
number=[1,2,3,4]

data1=list(zip(name,number))
print(data1)

d=pd.DataFrame(data=data1,columns=["name","number"])
d

[('a', 1), ('b', 2), ('c', 3), ('d', 4)]


Unnamed: 0,name,number
0,a,1
1,b,2
2,c,3
3,d,4


## Add more dimension
### Method #1: By declaring a new list as a column.

In [8]:
number=[4,3,2,1]

print(d)

d['number_rev']=number
d

  name  number
0    a       1
1    b       2
2    c       3
3    d       4


Unnamed: 0,name,number,number_rev
0,a,1,4
1,b,2,3
2,c,3,2
3,d,4,1


### Method #2: By using DataFrame.insert()

In [9]:
# Import pandas package 
d.insert(1,"rev_name",["d","c","b","a"])
d

Unnamed: 0,name,rev_name,number,number_rev
0,a,d,1,4
1,b,c,2,3
2,c,b,3,2
3,d,a,4,1


## Dataframe To CSV

In [10]:
d.to_csv("new.csv")
d.to_csv('your.csv', index=False)

## Read data from various resources

In [11]:
e=pd.read_csv("new.csv")
e

Unnamed: 0.1,Unnamed: 0,name,rev_name,number,number_rev
0,0,a,d,1,4
1,1,b,c,2,3
2,2,c,b,3,2
3,3,d,a,4,1


In [12]:
h=pd.read_csv("your.csv")
h

Unnamed: 0,name,rev_name,number,number_rev
0,a,d,1,4
1,b,c,2,3
2,c,b,3,2
3,d,a,4,1


In [13]:
# from raw data
URL="https://raw.githubusercontent.com/coderanandmaurya/PANDAS/main/matches.csv"
matches_df=pd.read_csv(URL)
matches_df.head(4)

Unnamed: 0,id,season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,2017,Hyderabad,2017-04-05,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,2,2017,Pune,2017-04-06,Mumbai Indians,Rising Pune Supergiant,Rising Pune Supergiant,field,normal,0,Rising Pune Supergiant,0,7,SPD Smith,Maharashtra Cricket Association Stadium,A Nand Kishore,S Ravi,
2,3,2017,Rajkot,2017-04-07,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium,Nitin Menon,CK Nandan,
3,4,2017,Indore,2017-04-08,Rising Pune Supergiant,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium,AK Chaudhary,C Shamshuddin,


## Read only col you need


In [14]:
h=pd.read_csv("your.csv",usecols=["name","number"])
h

Unnamed: 0,name,number
0,a,1
1,b,2
2,c,3
3,d,4


## Merge two Dataframe
<figure>
<center>
<img src='https://i.stack.imgur.com/hMKKt.jpg' />
</center>
</figure>


In [15]:
d = {'id': [1, 2, 10], 
     'val1': ['a', 'b', 'c']}
  
a = pd.DataFrame(d)
a

Unnamed: 0,id,val1
0,1,a
1,2,b
2,10,c


In [16]:
d = {'id': [1, 2, 8],
     'val1': ['p', 'q', 'r']}
b = pd.DataFrame(d)
  
# printing the dataframe
b

Unnamed: 0,id,val1
0,1,p
1,2,q
2,8,r


**Inner Join**: Inner join is the most common type of join you’ll be working with. It returns a dataframe with only those rows that have common characteristics. This is similar to the intersection of two sets.
<figure>
<center>
<img src='https://media.geeksforgeeks.org/wp-content/uploads/20201213140512/Screenshot16151.png' />
</center>
</figure>

In [17]:
# inner join
df1 = pd.merge(a, b, on='id', how='inner')
  
# display dataframe
df1

Unnamed: 0,id,val1_x,val1_y
0,1,a,p
1,2,b,q


**Left Outer Join:** With a left outer join, all the records from the first dataframe will be displayed, irrespective of whether the keys in the first dataframe can be found in the second dataframe. Whereas, for the second dataframe, only the records with the keys in the second dataframe that can be found in the first dataframe will be displayed.
<figure>
<center>
<img src='https://media.geeksforgeeks.org/wp-content/uploads/20201213140551/Screenshot16151.png' />
</center>
</figure>

In [18]:
# left outer join
df = pd.merge(a, b, on='id', how='left')
  
# display dataframe
df

Unnamed: 0,id,val1_x,val1_y
0,1,a,p
1,2,b,q
2,10,c,


**Right Outer Join:** For a right join, all the records from the second dataframe will be displayed. However, only the records with the keys in the first dataframe that can be found in the second dataframe will be displayed.
<figure>
<center>
<img src='https://media.geeksforgeeks.org/wp-content/uploads/20201213140644/Screenshot16151.png' />
</center>
</figure>

In [19]:
# right outer join
df = pd.merge(a, b, on='id', how='right')
  
# display dataframe
df

Unnamed: 0,id,val1_x,val1_y
0,1,a,p
1,2,b,q
2,8,,r


**Full Outer Join:** A full outer join returns all the rows from the left dataframe, all the rows from the right dataframe, and matches up rows where possible, with NaNs elsewhere. But if the dataframe is complete, then we get the same output.
<figure>
<center>
<img src='https://media.geeksforgeeks.org/wp-content/uploads/20201213140725/Screenshot16151.png' />
</center>
</figure>

In [20]:
# full outer join
df = pd.merge(a, b, on='id', how='outer')
  
# display dataframe
df

Unnamed: 0,id,val1_x,val1_y
0,1,a,p
1,2,b,q
2,10,c,
3,8,,r


**Index Join:** To merge the dataframe on indices pass the left_index and right_index arguments as True i.e. both the dataframes are merged on an index using default Inner Join.

In [21]:
# index join
df = pd.merge(a, b, left_index=True, right_index=True)
  
# display dataframe
df

Unnamed: 0,id_x,val1_x,id_y,val1_y
0,1,a,1,p
1,2,b,2,q
2,10,c,8,r
