#Pandas Introduction
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
Use :
Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

#Installation of Pandas
#If you have Python and PIP already installed on a system, then installation of Pandas is very easy.

#Install it using this command:

C:\Users\rahul>pip install pandas

In [12]:
#Import Pandas
#Once Pandas is installed, import it in your applications by adding the import keyword:

import pandas

In [14]:
#Example
import pandas

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)

print(myvar)

    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2


In [18]:
#Pandas as pd
#Pandas is usually imported under the pd alias.
import pandas as pd

In [20]:
#Example
import pandas as pd

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)

    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2


In [22]:
#Checking Pandas Version
#The version string is stored under __version__ attribute.
#Example
import pandas as pd

print(pd.__version__)

2.1.4


#Pandas Series
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.

In [24]:
#Example
#Create a simple Pandas Series from a list:

import pandas as pd

a = [5, 7, 6]

myvar = pd.Series(a)

print(myvar)

0    5
1    7
2    6
dtype: int64


In [26]:
#Labels
#If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.
#This label can be used to access a specified value.
#Example
#Return the first value of the Series:

print(myvar[0])

5


In [30]:
#Create Labels
#With the index argument, you can name your own labels.
#Example
#Create your own labels:

import pandas as pd

a = [5, 7, 6]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)
print(myvar["y"])

x    5
y    7
z    6
dtype: int64
7


In [32]:
#Key/Value Objects as Series
#Example
#Create a simple Pandas Series from a dictionary:

import pandas as pd

calories = {"day1": 540, "day2": 560, "day3": 570}

myvar = pd.Series(calories)

print(myvar)

day1    540
day2    560
day3    570
dtype: int64


In [34]:
#Example
#Create a Series using only data from "day1" and "day2":

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

print(myvar)

day1    420
day2    380
dtype: int64


In [38]:
#DataFrames
#Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
#Series is like a column, a DataFrame is the whole table.

#Example
#Create a DataFrame from two Series:

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

myvar = pd.DataFrame(data)

print(myvar)

   calories  duration
0       420        50
1       380        40
2       390        45


In [48]:
# Opereation
# slicing
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
#slincing by labels
print(s['b':'d'])
#slicing by position
print(s[1:4])

b    2
c    3
d    4
dtype: int64
b    2
c    3
d    4
dtype: int64


In [54]:
# Arithmetic Operations
s1 = pd.Series([1,3,5,7])
s2 = pd.Series([2,4,6,8])
# Element wise addition
print(s1 + s2)
#Scalar multiplication
print(s1 * s2)

0     3
1     7
2    11
3    15
dtype: int64
0     2
1    12
2    30
3    56
dtype: int64


In [56]:
#Boolean Operations
# Element-wise comparison
print(s1 > 2)
s1=pd.Series([1,3,5,7])

0    False
1     True
2     True
3     True
dtype: bool


In [58]:
# Aggregation and Reduction
s = pd.Series([1,3,5,7])
print(s.sum())   #sum of all elements
print(s.mean())  #mean of all elements
print(s.min())   #Minimum value
print(s.max())   #Maximum value

16
4.0
1
7


In [64]:
# Sorting and Ranking
s = pd.Series([5,6,8,7,10,15])
# sorting by values
print(s.sort_values())

# sorting by index
print(s.sort_index())

#Ranking
print(s.rank())

0     5
1     6
3     7
2     8
4    10
5    15
dtype: int64
0     5
1     6
2     8
3     7
4    10
5    15
dtype: int64
0    1.0
1    2.0
2    4.0
3    3.0
4    5.0
5    6.0
dtype: float64


In [68]:
#Handling Missing Data
s = pd.Series([1,2,None,4,5])
#check for missing values
print(s.isnull())
#Fill missing value with a specific value
print(s.fillna(0))

0    False
1    False
2     True
3    False
4    False
dtype: bool
0    1.0
1    2.0
2    0.0
3    4.0
4    5.0
dtype: float64


In [74]:
#Element-wise Functions:
import pandas as pd
# creating a series objects
s = pd.Series([4,6,9,15])
print(s**0.5)

0    2.000000
1    2.449490
2    3.000000
3    3.872983
dtype: float64
