### What is Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

https://pandas.pydata.org/about/index.html

### Pandas Series
A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

### Importing Pandas

In [32]:
import numpy as np
import pandas as pd


### Series from lists

In [33]:
# String
country = ['India', 'Pakistan', 'USA', 'Nepal', 'Srilanka']

pd.Series(country)


Unnamed: 0,0
0,India
1,Pakistan
2,USA
3,Nepal
4,Srilanka


In [34]:
# integers
run = [29,45,37,58,38]

pd.Series(run)

Unnamed: 0,0
0,29
1,45
2,37
3,58
4,38


In [35]:
# custom index
marks = [80,77,90,88]
subjects = ['maths','english','science','hindi']

pd.Series(marks,index = subjects)


Unnamed: 0,0
maths,80
english,77
science,90
hindi,88


In [36]:
# setting a name
marks=pd.Series(marks,index = subjects, name='Some Marks')
marks

Unnamed: 0,Some Marks
maths,80
english,77
science,90
hindi,88


### Series from dict

In [37]:
marks = {
    'maths':67,
    'english':77,
    'science':89,
    'hindi':100
}

marks_series = pd.Series(marks,name='Some Marks')
marks_series

Unnamed: 0,Some Marks
maths,67
english,77
science,89
hindi,100


### Series Attributes

In [38]:
# size
marks_series.size

4

In [39]:
# dtype
marks_series.dtype

dtype('int64')

In [40]:
# name
marks_series.name

'Some Marks'

In [41]:
# is_unique :- checks if the items are repetitive or not
marks_series.is_unique

pd.Series([1,1,2,3,4,5]).is_unique

False

In [42]:
# index
marks_series.index

Index(['maths', 'english', 'science', 'hindi'], dtype='object')

In [43]:
# values
marks_series.values

array([ 67,  77,  89, 100])

### Series using read_csv

In [44]:
# with one col
pd.read_csv('/content/subs.csv')
type(pd.read_csv('/content/subs.csv')) # csv is defaultly stored in the form of dataframe

In [45]:
# with one col
# subs = pd.read_csv('/content/subs.csv',squeeze=True)
# subs
# the above will not work because squeeze is deprecated in recent python versions

subs = pd.read_csv("subs.csv")["Subscribers gained"]
subs


Unnamed: 0,Subscribers gained
0,48
1,57
2,40
3,43
4,44
...,...
360,231
361,226
362,155
363,144


In [46]:
# with 2 cols
vk=pd.read_csv('/content/kohli_ipl.csv').set_index('match_no')['runs']
vk

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
1,1
2,23
3,13
4,12
5,1
...,...
211,0
212,20
213,73
214,25


In [47]:
movie = pd.read_csv('/content/bollywood.csv').set_index('movie')['lead']
movie

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Uri: The Surgical Strike,Vicky Kaushal
Battalion 609,Vicky Ahuja
The Accidental Prime Minister (film),Anupam Kher
Why Cheat India,Emraan Hashmi
Evening Shadows,Mona Ambegaonkar
...,...
Hum Tumhare Hain Sanam,Shah Rukh Khan
Aankhen (2002 film),Amitabh Bachchan
Saathiya (film),Vivek Oberoi
Company (film),Ajay Devgn


### Series methods

In [48]:
# head and tail
subs.head()

Unnamed: 0,Subscribers gained
0,48
1,57
2,40
3,43
4,44


In [49]:
vk.head(3) # top 3

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
1,1
2,23
3,13


In [50]:
vk.tail() # last

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
211,0
212,20
213,73
214,25
215,7


In [51]:
vk.tail(3) # last 3

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
213,73
214,25
215,7


In [56]:
# sample
movie.sample() # will randomly pick a sample
# used when data is biased and sample is used to avoid that bias by selecting on random

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Prassthanam,Sanjay Dutt


In [61]:
movie.sample(5) # random 5

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Gori Tere Pyaar Mein,Imran Khan
Billu,Irrfan Khan
Go Goa Gone,Saif Ali Khan
Shaadi Se Pehle,Akshaye Khanna
Kaalo,Aditya Srivastava


In [64]:
# value_counts -> counts frequencies in the data
# like how many movies per actor
movie.value_counts()

Unnamed: 0_level_0,count
lead,Unnamed: 1_level_1
Akshay Kumar,48
Amitabh Bachchan,45
Ajay Devgn,38
Salman Khan,31
Sanjay Dutt,26
...,...
Diganth,1
Parveen Kaur,1
Seema Azmi,1
Akanksha Puri,1


In [68]:
# sort_values -> inplace
vk.sort_values(ascending = False)
vk.sort_values(ascending = False).head(1)
vk.sort_values(ascending = False).head(1).values[0]

113

In [None]:
# vk.sort_values(inplace=True) # this will cause permanent change in data

In [70]:
# sort_index -> inplace -> movies
movie.sort_index()
# movie.sort_index(inplace = True)

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
1920 (film),Rajniesh Duggall
1920: London,Sharman Joshi
1920: The Evil Returns,Vicky Ahuja
1971 (2007 film),Manoj Bajpayee
2 States (2014 film),Arjun Kapoor
...,...
Zindagi 50-50,Veena Malik
Zindagi Na Milegi Dobara,Hrithik Roshan
Zindagi Tere Naam,Mithun Chakraborty
Zokkomon,Darsheel Safary
