### Date : 20-08-2020

## Day Objectives

## Pandas Library in Python
 - Series - One Dimensional
 - DataFrame - Two Dimensional
 
 - **Basic Methods in Series**

    - Series.index - The index (axis labels) of the Series.

    - Series.array - The ExtensionArray of the data backing this Series or Index.

    - Series.values - Return Series as ndarray or ndarray-like depending on the dtype.

    - Series.dtype - Return the dtype object of the underlying data.

    - Series.shape - Return a tuple of the shape of the underlying data.

    - Series.nbytes - Return the number of bytes in the underlying data.

    - Series.ndim - Number of dimensions of the underlying data, by definition 1.

    - Series.size - Return the number of elements in the underlying data.


- DataFrames

- DataFrame.index - The index (row labels) of the DataFrame.

- DataFrame.columns - The column labels of the DataFrame.

- DataFrame.dtypes - Return the dtypes in the DataFrame.

- DataFrame.info([verbose, buf, max_cols, …]) - Print a concise summary of a DataFrame.

- DataFrame.select_dtypes([include, exclude]) - Return a subset of the DataFrame’s columns based on the column dtypes.

- DataFrame.values - Return a Numpy representation of the DataFrame.

- DataFrame.axes - Return a list representing the axes of the DataFrame.

- DataFrame.ndim - Return an int representing the number of axes / array dimensions.

- DataFrame.size - Return an int representing the number of elements in this object.

- DataFrame.shape - Return a tuple representing the dimensionality of the DataFrame.

## Creating Series

In [11]:
import pandas as pd
s1 = pd.Series([1,2,3,4,5])
s1
print(s1[2])
print(s1[2:])

3
2    3
3    4
4    5
dtype: int64


In [9]:
s2 = pd.Series(['a','b','c','d'])
s2
print(s2.values)
print(s2.index)
print(s2.shape)
print(s2.size)
print(s2.ndim)

['a' 'b' 'c' 'd']
RangeIndex(start=0, stop=4, step=1)
(4,)
4
1


In [19]:
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
ind  = [11,22,33,44]
s = pd.Series(data,index = ind)
print(s)

11    a
22    b
33    c
44    d
dtype: object


## Indexing Series

In [13]:
import pandas as pd
s3 = pd.Series([1,3,5,7])
print(s3)

0    1
1    3
2    5
3    7
dtype: int64


In [18]:
# Program to print numbers series with indices of alphabets
import pandas as pd
pd.Series([3,5,7],index = ['a','b','c'])

a    3
b    5
c    7
dtype: int64

In [21]:
# Program to print numbers with the index of date (20200820 to 20200825)
pd.Series([20,21,22,23,24],index = pd.date_range('20200817','20200821'))

2020-08-17    20
2020-08-18    21
2020-08-19    22
2020-08-20    23
2020-08-21    24
Freq: D, dtype: int64

In [22]:
pd.Series(pd.date_range('20200803','20200809'),index = ['Day 1','Day 2','Day 3','Day 4','Day 5','Day 6','Day 7'])

Day 1   2020-08-03
Day 2   2020-08-04
Day 3   2020-08-05
Day 4   2020-08-06
Day 5   2020-08-07
Day 6   2020-08-08
Day 7   2020-08-09
dtype: datetime64[ns]

In [30]:
# Program to print square of the numbers with indices by using numpy
import numpy as np
import pandas as pd
s1 = pd.Series(np.array(range(1,11))**2,index = range(1,11))
s1

1       1
2       4
3       9
4      16
5      25
6      36
7      49
8      64
9      81
10    100
dtype: int32

In [25]:
# Creating Nan values and checking
import numpy as np
import pandas as pd
pd.isna('dog')

pd.isna(pd.NA)

pd.isna(np.nan)

True

## Creating and Finding the missing values

In [21]:
import numpy as np
import pandas
array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
print(array)
print(pd.isna(array))

[[ 1. nan  3.]
 [ 4.  5. nan]]
[[False  True False]
 [False False  True]]


## DataFrames

In [33]:
# Creating DataFrames by using Lists
import pandas as pd
data = [['Alex',30,'Doctor'],['Alisa',23,'Teacher'],['Alice',31,'Engineer']]
df = pd.DataFrame(data,columns = ['Name','Age','Occupation'],index = ['p1','p2','p3'])
df

Unnamed: 0,Name,Age,Occupation
p1,Alex,30,Doctor
p2,Alisa,23,Teacher
p3,Alice,31,Engineer


In [29]:
# Creating DF using Dictionaries
import pandas
internal1 = {'s1':35,'s2':25,'s3':30}
print(pd.Series(internal1))
internal2 = {'s2':15,'s3':20,'s4':10}
print(pd.Series(internal2))

s1    35
s2    25
s3    30
dtype: int64
s2    15
s3    20
s4    10
dtype: int64


In [31]:
final = {'Internal1':internal1,'Internal2':internal2}
final = pd.DataFrame(final)
print(final)
pd.isna(final)

    Internal1  Internal2
s1       35.0        NaN
s2       25.0       15.0
s3       30.0       20.0
s4        NaN       10.0


Unnamed: 0,Internal1,Internal2
s1,False,True
s2,False,False
s3,False,False
s4,True,False


In [36]:
d = {'Name':['Vijay','Vinay','Jane'],'Age':[23,25,37],'Occupation':['Engineer','Student','Doctor']}
df = pd.DataFrame(d)
df

Unnamed: 0,Name,Age,Occupation
0,Vijay,23,Engineer
1,Vinay,25,Student
2,Jane,37,Doctor


In [None]:
## HOW CAN WE READ FROM A CSV FILE.