<a href="https://colab.research.google.com/github/anujsaxena/Python/blob/main/Pandas_tut.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Pandas**
Pandas is an open-source Python library that uses strong data structures to provide a high-performance data manipulation and analysis tool. Pandas gets its name from the term Panel Data, which is an Econometrics term for multidimensional data.

When developer Wes McKinney needed a high-performance, versatile tool for data analysis, he started creating pandas in 2008.
Python was formerly mostly used for data munging (the act of converting data to a different format) and preparation before Pandas. It made just a minor contribution to data analysis. This issue was fixed by pandas. Regardless of the data's origin, we may perform five common processes in data processing and analysis with Pandas:

1. load
2. prepare
3. manipulate
4. model
5. analysis

# **Key Features of Pandas**
1. Fast and efficient DataFrame object with default and customized indexing.
2. Tools for loading data into in-memory data objects from different file formats.
3. Data alignment and integrated handling of missing data.
4. Reshaping and pivoting of data sets.
5. Label-based slicing, indexing and subsetting of large data sets.
6. Columns from a data structure can be deleted or inserted.
7. Group by data for aggregation and transformations.
8. High performance merging and joining of data.
9. Time Series functionality.

# **Series**

A pandas Series can be created using the following constructor −

pandas.Series(data, index, dtype, copy)

Description of parameters

1. data

data takes various forms like ndarray, list, constants
2. Index

Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
3. dtype

dtype is for data type. If None, data type will be inferred
4. copy

Copy data. Default False

An empty series can be created or it can take following input:
1. Array
2. Dictionary
3. Scalar Values or constants


In [2]:
import pandas as pd
s = pd.Series()
print(s)

Series([], dtype: float64)


  


In [3]:
import numpy as np
import pandas as pd
data = np.array(['a','e','i','o','u'])
print(data)
print(type(data))
s= pd.Series(data)
print(s)
print(type(s))

['a' 'e' 'i' 'o' 'u']
<class 'numpy.ndarray'>
0    a
1    e
2    i
3    o
4    u
dtype: object
<class 'pandas.core.series.Series'>


In [4]:
data = np.array(['a','e','i','o','u'])
print(data)
print(type(data))
s= pd.Series(data, index=[100,101,102, 103, 104])
print(s)
print(type(s))

['a' 'e' 'i' 'o' 'u']
<class 'numpy.ndarray'>
100    a
101    e
102    i
103    o
104    u
dtype: object
<class 'pandas.core.series.Series'>


In [6]:
a = np.arange(100,105)
s= pd.Series(data, a)
print(s)
print(type(s))

100    a
101    e
102    i
103    o
104    u
dtype: object
<class 'pandas.core.series.Series'>


# **From Dictionary**

In [7]:
import pandas as pd
data = {'a' : 0., 'b' : 1., 'c' : 2.}
print(type(data))
s = pd.Series(data)
print(s)
print(type(s))


<class 'dict'>
a    0.0
b    1.0
c    2.0
dtype: float64
<class 'pandas.core.series.Series'>


In [8]:
import pandas as pd
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','a'])
print(s)


b    1.0
c    2.0
a    0.0
dtype: float64


In [9]:
import pandas as pd
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print(s)

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


# **From Scalar**

In [10]:
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print(s)


0    5
1    5
2    5
3    5
dtype: int64


In [11]:
import pandas as pd
import numpy as np
s = pd.Series('a', index=[0, 1, 2, 3])
print(s)


0    a
1    a
2    a
3    a
dtype: object


# **Access Series with position**

In [12]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s)
#retrieve the first element
print(s[0])


a    1
b    2
c    3
d    4
e    5
dtype: int64
1


In [13]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s)
#retrieve the first element
print(s[2])

a    1
b    2
c    3
d    4
e    5
dtype: int64
3


In [14]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s)
#retrieve the elements
print(s[0:])
print(s[2:])
print(s[:1])
print(s[:3])
print(s[-3:])
print(s[:-3])


a    1
b    2
c    3
d    4
e    5
dtype: int64
a    1
b    2
c    3
d    4
e    5
dtype: int64
c    3
d    4
e    5
dtype: int64
a    1
dtype: int64
a    1
b    2
c    3
dtype: int64
c    3
d    4
e    5
dtype: int64
a    1
b    2
dtype: int64


# **Retrieve index (labels)**

In [15]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [16]:
print(s['b'])

2


In [17]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print(s)

0    1.444777
1    0.109798
2   -1.026567
3   -0.639889
dtype: float64


In [18]:
print("axes:")
print(s.axes)

axes:
[RangeIndex(start=0, stop=4, step=1)]


In [19]:
s = pd.Series(np.random.randn(4))
print(s)
print ("Is the Object empty?")
print(s.empty)

0    1.159450
1   -1.227221
2   -0.438324
3   -0.150246
dtype: float64
Is the Object empty?
False


In [20]:
s = pd.Series(np.random.randn(10))
print(s)
print ("The dimensions of the object:")
print(s.ndim)


0   -0.806750
1   -0.066413
2    0.229462
3   -1.165203
4    1.654997
5   -1.009233
6    0.526440
7    0.171275
8   -1.118827
9    0.031204
dtype: float64
The dimensions of the object:
1


In [22]:
s = pd.Series(np.random.randn(10))
print(s)
print ("The size of the object:")
print(s.size)


0   -0.012007
1    0.151293
2    0.887137
3    0.159632
4   -0.227553
5   -1.097296
6   -0.089453
7   -0.499558
8   -0.010527
9   -0.349736
dtype: float64
The size of the object:
10


In [24]:
s = pd.Series(np.random.randn(4))
print(s)
print ("The actual data series is:")
l = s.values
print(l)
print(type(l))


0    1.678780
1    1.668988
2   -2.270569
3   -0.227429
dtype: float64
The actual data series is:
[ 1.67878049  1.66898801 -2.2705694  -0.22742877]
<class 'numpy.ndarray'>


# **Head and Tail**

In [25]:
s = pd.Series(np.random.randn(4))
print ("The original series is:")
print(s)
print ("The first two rows of the data series:")
print(s.head(2))


The original series is:
0    1.912007
1   -0.581676
2    0.234244
3    0.927728
dtype: float64
The first two rows of the data series:
0    1.912007
1   -0.581676
dtype: float64


In [26]:
s = pd.Series(np.random.randn(10))
print ("The first two rows of the data series:")
print(s.head())


The first two rows of the data series:
0   -0.535387
1   -0.795072
2    1.018451
3    0.999646
4   -0.476918
dtype: float64


In [27]:
s = pd.Series(np.random.randn(10))
print ("The first two rows of the data series:")
print(s.head(3))

The first two rows of the data series:
0   -0.316929
1   -1.754089
2    1.117887
dtype: float64


In [28]:
s = pd.Series(np.random.randn(10))
print ("The first two rows of the data series:")
print(s.tail())

The first two rows of the data series:
5    0.065244
6   -0.009855
7    1.851912
8    0.019836
9    1.441875
dtype: float64


In [29]:
s = pd.Series(np.random.randn(10))
print ("The first two rows of the data series:")
print(s.tail(2))

The first two rows of the data series:
8    1.558959
9   -0.294727
dtype: float64


# **Data Frame**

In [30]:
df = pd.DataFrame()
print(df)
print(type(df))

Empty DataFrame
Columns: []
Index: []
<class 'pandas.core.frame.DataFrame'>


In [31]:
data = [['Alex',30],['Bob',42],['Clarke',33]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)

     Name  Age
0    Alex   30
1     Bob   42
2  Clarke   33


# **Create** a DataFrame from Dict of ndarrays / Lists

In [32]:
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
print(data)
print(type(data))
df = pd.DataFrame(data)
print(df)
print(type(df))


{'Name': ['Tom', 'Jack', 'Steve', 'Ricky'], 'Age': [28, 34, 29, 42]}
<class 'dict'>
    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42
<class 'pandas.core.frame.DataFrame'>


In [34]:
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Percentage':[98,97,96,95]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(df)

        Name  Percentage
rank1    Tom          98
rank2   Jack          97
rank3  Steve          96
rank4  Ricky          95


In [35]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
print(data)
print(type(data))
df = pd.DataFrame(data)
print(df)

[{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
<class 'list'>
   a   b     c
0  1   2   NaN
1  5  10  20.0


In [36]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print(df)

        a   b     c
first   1   2   NaN
second  5  10  20.0


In [38]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df0 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b','c'])
print(df0)
#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)

        a   b     c
first   1   2   NaN
second  5  10  20.0
        a   b
first   1   2
second  5  10
        a  b1
first   1 NaN
second  5 NaN


# **Create a DataFrame from Dict of Series**

In [39]:
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
print(d)
df = pd.DataFrame(d)
print(df)


{'one': a    1
b    2
c    3
dtype: int64, 'two': a    1
b    2
c    3
d    4
dtype: int64}
   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


In [41]:
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)
print(df ['one'])


   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4
a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64


In [42]:
print(df ['two'])

a    1
b    2
c    3
d    4
Name: two, dtype: int64
