## Pandas

### What is Pandas
##### Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

###### https://pandas.pydata.org/about/index.html

#### Pandas Series
###### A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

#### Importing Pandas

In [114]:
import numpy as np
import pandas as pd

#### Series from lists

In [152]:
# from String
country = ["India","Pakistan","USA","Nepal","Sri Lanka"]
pd.Series(country)
# most of the time pandas string ko object bulata hai
# where 0,1,2,3,4 are the index numbers

0        India
1     Pakistan
2          USA
3        Nepal
4    Sri Lanka
dtype: object

In [153]:
# from integers
runs = [13,24,56,78,100]
pd.Series(runs)

0     13
1     24
2     56
3     78
4    100
dtype: int64

In [154]:
# custom index -->> You can create own own index for a particular value
marks = [67,57,89,100]
subjects = ["Maths","English","Science","Hindi"]
pd.Series(marks,index=subjects)

Maths       67
English     57
Science     89
Hindi      100
dtype: int64

In [155]:
# setting a name -->> for your series object
marks = pd.Series(marks,index=subjects,name = "Nitish k marks")
marks

Maths       67
English     57
Science     89
Hindi      100
Name: Nitish k marks, dtype: int64

#### Series from dict

In [156]:
marks = {
    "Maths": 67,
    "English": 57,
    "Science": 89,
    "Hindi": 100
}
marks_series = pd.Series(marks,name = "Nitish k marks")
marks_series

Maths       67
English     57
Science     89
Hindi      100
Name: Nitish k marks, dtype: int64

#### Series Attributes

In [157]:
#size
marks_series.size

4

In [158]:
# dtype
marks_series.dtype

dtype('int64')

In [159]:
# name
marks_series.name

'Nitish k marks'

In [160]:
# is_unique -->> gives true if all items are unique and false if all items are not unique
marks_series.is_unique

True

In [161]:
# index -->> gives all the index values
marks_series.index


Index(['Maths', 'English', 'Science', 'Hindi'], dtype='object')

In [162]:
# values
print(marks_series.values)
print(type(marks_series.values))

[ 67  57  89 100]
<class 'numpy.ndarray'>


#### Series using read_csv

###### csv -->> comma seperated values

In [163]:
# with one col
pd.read_csv(r'C:\Users\deepu\OneDrive\Desktop\Python\subs.csv')

Unnamed: 0,Subscribers gained
0,48
1,57
2,40
3,43
4,44
...,...
360,231
361,226
362,155
363,144


##### Above output is not in the form of series it's by default in the form of Data frame you can check it by using type() and to change into series we use subs.squeeze()

In [164]:
type(pd.read_csv(r'C:\Users\deepu\OneDrive\Desktop\Python\subs.csv'))

pandas.core.frame.DataFrame

In [165]:
# Now convert the data frame into series by using squeeze = True
subs = pd.read_csv(r'C:\Users\deepu\OneDrive\Desktop\Python\subs.csv')
subs.squeeze()

0       48
1       57
2       40
3       43
4       44
      ... 
360    231
361    226
362    155
363    144
364    172
Name: Subscribers gained, Length: 365, dtype: int64

##### NOTE: If data are too big then python shows only top 5 data and last 5 data

In [166]:
# Now it's change to series
type(subs.squeeze())

pandas.core.series.Series

In [167]:
# 2nd data
vk = pd.read_csv(r"C:\Users\deepu\OneDrive\Desktop\Python\kohli_ipl.csv",index_col="match_no")
vk.squeeze()

match_no
1       1
2      23
3      13
4      12
5       1
       ..
211     0
212    20
213    73
214    25
215     7
Name: runs, Length: 215, dtype: int64

In [190]:
# 3rd data
movies = pd.read_csv(r"C:\Users\deepu\OneDrive\Desktop\Python\bollywood.csv",index_col="movie")
movies.squeeze()

movie
Uri: The Surgical Strike                   Vicky Kaushal
Battalion 609                                Vicky Ahuja
The Accidental Prime Minister (film)         Anupam Kher
Why Cheat India                            Emraan Hashmi
Evening Shadows                         Mona Ambegaonkar
                                              ...       
Hum Tumhare Hain Sanam                    Shah Rukh Khan
Aankhen (2002 film)                     Amitabh Bachchan
Saathiya (film)                             Vivek Oberoi
Company (film)                                Ajay Devgn
Awara Paagal Deewana                        Akshay Kumar
Name: lead, Length: 1500, dtype: object

#### Series methods

In [169]:
# head and tail -->> when you only wants to know that how data is looking not full details then use head and tail
subs.head()

Unnamed: 0,Subscribers gained
0,48
1,57
2,40
3,43
4,44


In [170]:
vk.head(4)

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
1,1
2,23
3,13
4,12


In [171]:
# tail -->> n data from bottom
vk.tail(5)

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
211,0
212,20
213,73
214,25
215,7


In [172]:
# sample -->> randomly shows any one data
movies.sample()

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Mission Istaanbul,Vivek Oberoi


In [173]:
movies.sample(5)

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Khushi (2003 Hindi film),Fardeen Khan
404 (film),Sara Arjun
Manikarnika: The Queen of Jhansi,Kangana Ranaut
The Blueberry Hunt,Kartik Elangovan
68 Pages,Mouli Ganguly


In [174]:
# value_counts -->> movies
movies.value_counts()
# Gives the frequency of all values -->> Here value are actors so value_counts shows how many movies a particular actor made

lead            
Akshay Kumar        48
Amitabh Bachchan    45
Ajay Devgn          38
Salman Khan         31
Sanjay Dutt         26
                    ..
Kashmira Shah        1
Kartik Elangovan     1
Karisma Kapoor       1
Karan Sharma         1
Zulfi Sayed          1
Name: count, Length: 566, dtype: int64

In [175]:
# sort_values -->> inplace
vk.squeeze().sort_values()
# sorts the value of any data in ascending order

match_no
87       0
211      0
207      0
206      0
91       0
      ... 
164    100
120    100
123    108
126    109
128    113
Name: runs, Length: 215, dtype: int64

In [176]:
vk.squeeze().sort_values(ascending=False)

match_no
128    113
126    109
123    108
164    100
120    100
      ... 
93       0
211      0
130      0
8        0
135      0
Name: runs, Length: 215, dtype: int64

In [177]:
# see a example of method channing 
# if you want vk ka hu=ighest run in ipl set head(1) and how much value is then set values[0]
vk.squeeze().sort_values(ascending=False).head(1).values[0]

113

In [184]:
vk.sort_values(ascending=True,inplace=True)
# Something is missing I'll figure out later 

TypeError: DataFrame.sort_values() missing 1 required positional argument: 'by'

In [185]:
# sort_index -->> inplace
movies

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Uri: The Surgical Strike,Vicky Kaushal
Battalion 609,Vicky Ahuja
The Accidental Prime Minister (film),Anupam Kher
Why Cheat India,Emraan Hashmi
Evening Shadows,Mona Ambegaonkar
...,...
Hum Tumhare Hain Sanam,Shah Rukh Khan
Aankhen (2002 film),Amitabh Bachchan
Saathiya (film),Vivek Oberoi
Company (film),Ajay Devgn


In [186]:
movies.sort_index()

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
1920 (film),Rajniesh Duggall
1920: London,Sharman Joshi
1920: The Evil Returns,Vicky Ahuja
1971 (2007 film),Manoj Bajpayee
2 States (2014 film),Arjun Kapoor
...,...
Zindagi 50-50,Veena Malik
Zindagi Na Milegi Dobara,Hrithik Roshan
Zindagi Tere Naam,Mithun Chakraborty
Zokkomon,Darsheel Safary


##### NOTE : sort_index/values(inplace=True) does a permanent changes so be careful

In [187]:
movies.sort_index(inplace=True)
# It's make permanent changes that'why while running no output is showed 
# also if you want to do in descending order use -->> movies.sort_index(ascending=False,inplace=True)

In [188]:
# Now you can rum movies and see that permanent changes ( i.e movies is sorted permanently as 1st digit and then a to z)
movies

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
1920 (film),Rajniesh Duggall
1920: London,Sharman Joshi
1920: The Evil Returns,Vicky Ahuja
1971 (2007 film),Manoj Bajpayee
2 States (2014 film),Arjun Kapoor
...,...
Zindagi 50-50,Veena Malik
Zindagi Na Milegi Dobara,Hrithik Roshan
Zindagi Tere Naam,Mithun Chakraborty
Zokkomon,Darsheel Safary


##### For normalise permanent changes run the movies cell where you first created it then it will became normal , again run movies and see all the data are unsorted see below and don't run that cell agin which makes permanent changes

In [191]:
movies

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Uri: The Surgical Strike,Vicky Kaushal
Battalion 609,Vicky Ahuja
The Accidental Prime Minister (film),Anupam Kher
Why Cheat India,Emraan Hashmi
Evening Shadows,Mona Ambegaonkar
...,...
Hum Tumhare Hain Sanam,Shah Rukh Khan
Aankhen (2002 film),Amitabh Bachchan
Saathiya (film),Vivek Oberoi
Company (film),Ajay Devgn


#### Series Maths Methods

In [193]:
# count -->> same like size (but count doesn't count missing values but size counts missing values also)
vk.count() # tells how many matches Virat Kohli played

runs    215
dtype: int64

In [195]:
movies.count()
# Means there are total 1500 movies in the data provided

lead    1500
dtype: int64

In [198]:
# sum -->> product
print(vk.sum())
print(subs.sum())
print(movies.sum())
# same as sum product does the product of each items

runs    6634
dtype: int64
Subscribers gained    49510
dtype: int64
lead    Vicky KaushalVicky AhujaAnupam KherEmraan Hash...
dtype: object


In [205]:
# mean -->> median -->> mode -->> std -->> var
print(subs.mean())
print(vk.median())
print(movies.mode())
# Mode -->> Any partcular values jo data me sabse jyde baar aati hai 
# 0 before Akshay Kumar showing the index no and Akshay Kumar made sabse jyade movies
print(subs.std())
print(vk.var())

Subscribers gained    135.643836
dtype: float64
runs    24.0
dtype: float64
           lead
0  Akshay Kumar
Subscribers gained    62.675023
dtype: float64
runs    688.002478
dtype: float64


In [208]:
# min/max
print(subs.max())
print(subs.min())
print(vk.min())
print(vk.max())

Subscribers gained    396
dtype: int64
Subscribers gained    33
dtype: int64
runs    0
dtype: int64
runs    113
dtype: int64


In [210]:
# describe
vk.describe()

Unnamed: 0,runs
count,215.0
mean,30.855814
std,26.229801
min,0.0
25%,9.0
50%,24.0
75%,48.0
max,113.0


In [211]:
subs.describe()

Unnamed: 0,Subscribers gained
count,365.0
mean,135.643836
std,62.675023
min,33.0
25%,88.0
50%,123.0
75%,177.0
max,396.0
