### ***Pandas***
Open Source Python Library provides high performance, easy to use data structures and data analysis tools.
- Tabular Data
- Ordered and unordered time series data
- Arbitary (homogenous or hetrogenous) matrix data
- Observational or statistical data

### 2 Types of Data Structures in Pandas
- Series (1D)
- DataFrame (2D)

- Used to Handle Missing Data
- Size Mutability
- Automatic and explicit data alignment
- powerful group by functionality to perform split-apply-combine operations on data sets
- slicing, fancy indexing and subsetting of large data sets
- merging and joining datasets
- reshaping and pivoting of datasets
- Hierarchical labeling of axes
- Loading Data from Flat Files, Excel, saving/loading data from HDF5 format

### ***Series***
- 1D labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

In [2]:
import pandas as pd
s1=pd.Series([12,64,45,89,14])

In [3]:
s1

0    12
1    64
2    45
3    89
4    14
dtype: int64

In [4]:
type(s1)

pandas.core.series.Series

In [5]:
s2=pd.Series([12,64,45,89.3,14])

In [6]:
s2

0    12.0
1    64.0
2    45.0
3    89.3
4    14.0
dtype: float64

In [7]:
s1=pd.Series([12,64,45,89,14],index=["*","**","***","****","*****"])
s1

*        12
**       64
***      45
****     89
*****    14
dtype: int64

In [8]:
s1["****"]

np.int64(89)

In [9]:
s1[["*","**","***"]]

*      12
**     64
***    45
dtype: int64

In [10]:
s1=pd.Series([12,64,45,89.3,14],index=["a","b","c","d","e"])
s1

a    12.0
b    64.0
c    45.0
d    89.3
e    14.0
dtype: float64

In [11]:
s1[["a","c"]]

a    12.0
c    45.0
dtype: float64

In [13]:
import numpy as np
d={1001:"Hritik",1002:"Ramesh",1003:"Rohit",1004:"Virat"}
roll=np.arange(1001,1005,1)
print(roll)
s1=pd.Series(d,index=roll)
s1

[1001 1002 1003 1004]


1001    Hritik
1002    Ramesh
1003     Rohit
1004     Virat
dtype: object

### DataFrame

2D Labeled Data Structure with columns of different types.

In [14]:
data={"country":["India","China","USA","India","China","USA","India","China","USA"],"year":[2015,2015,2015,2016,2016,2016,2017,2017,2017],"population":[30,35,26,31,32,27,33,30,24]}

In [15]:
d1=pd.DataFrame(data)
d1

Unnamed: 0,country,year,population
0,India,2015,30
1,China,2015,35
2,USA,2015,26
3,India,2016,31
4,China,2016,32
5,USA,2016,27
6,India,2017,33
7,China,2017,30
8,USA,2017,24


In [16]:
d1["year"]

0    2015
1    2015
2    2015
3    2016
4    2016
5    2016
6    2017
7    2017
8    2017
Name: year, dtype: int64

In [17]:
d1["population"]

0    30
1    35
2    26
3    31
4    32
5    27
6    33
7    30
8    24
Name: population, dtype: int64

In [20]:
d1.country

0    India
1    China
2      USA
3    India
4    China
5      USA
6    India
7    China
8      USA
Name: country, dtype: object

In [21]:
d1.country=="India"

0     True
1    False
2    False
3     True
4    False
5    False
6     True
7    False
8    False
Name: country, dtype: bool

In [22]:
d1[d1.country=="India"]

Unnamed: 0,country,year,population
0,India,2015,30
3,India,2016,31
6,India,2017,33


In [23]:
d1[d1.population%2==0]

Unnamed: 0,country,year,population
0,India,2015,30
2,USA,2015,26
4,China,2016,32
7,China,2017,30
8,USA,2017,24


In [25]:
d1[(d1.year%2==0) & (d1.population%2==0)]

Unnamed: 0,country,year,population
4,China,2016,32


In [26]:
d1[(d1.year%2==0) | (d1.population%2==0)]

Unnamed: 0,country,year,population
0,India,2015,30
2,USA,2015,26
3,India,2016,31
4,China,2016,32
5,USA,2016,27
7,China,2017,30
8,USA,2017,24


In [27]:
d1[(d1["year"]%2==0) & (d1["population"]%2==0)]

Unnamed: 0,country,year,population
4,China,2016,32


In [28]:
d1[(d1["year"]%2==0) | (d1["population"]%2==0)]

Unnamed: 0,country,year,population
0,India,2015,30
2,USA,2015,26
3,India,2016,31
4,China,2016,32
5,USA,2016,27
7,China,2017,30
8,USA,2017,24


In [30]:
d1

Unnamed: 0,country,year,population
0,India,2015,30
1,China,2015,35
2,USA,2015,26
3,India,2016,31
4,China,2016,32
5,USA,2016,27
6,India,2017,33
7,China,2017,30
8,USA,2017,24


In [33]:
data["gdp"]=[4.5,2.3,6.3,4.9,6.1,5.4,4.7,5.5,5.9]
data

{'country': ['India',
  'China',
  'USA',
  'India',
  'China',
  'USA',
  'India',
  'China',
  'USA'],
 'year': [2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017],
 'population': [30, 35, 26, 31, 32, 27, 33, 30, 24],
 'gdp': [4.5, 2.3, 6.3, 4.9, 6.1, 5.4, 4.7, 5.5, 5.9]}

In [34]:
d1

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5
1,China,2015,35,2.3
2,USA,2015,26,6.3
3,India,2016,31,4.9
4,China,2016,32,6.1
5,USA,2016,27,5.4
6,India,2017,33,4.7
7,China,2017,30,5.5
8,USA,2017,24,5.9


In [35]:
del data["gdp"]

In [36]:
d1

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5
1,China,2015,35,2.3
2,USA,2015,26,6.3
3,India,2016,31,4.9
4,China,2016,32,6.1
5,USA,2016,27,5.4
6,India,2017,33,4.7
7,China,2017,30,5.5
8,USA,2017,24,5.9


In [38]:
del d1["gdp"]

In [39]:
d1

Unnamed: 0,country,year,population
0,India,2015,30
1,China,2015,35
2,USA,2015,26
3,India,2016,31
4,China,2016,32
5,USA,2016,27
6,India,2017,33
7,China,2017,30
8,USA,2017,24


In [40]:
data

{'country': ['India',
  'China',
  'USA',
  'India',
  'China',
  'USA',
  'India',
  'China',
  'USA'],
 'year': [2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017],
 'population': [30, 35, 26, 31, 32, 27, 33, 30, 24]}

In [41]:
d1["gdp"]=[4.5, 2.3, 6.3, 4.9, 6.1, 5.4, 4.7, 5.5, 5.9]
d1

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5
1,China,2015,35,2.3
2,USA,2015,26,6.3
3,India,2016,31,4.9
4,China,2016,32,6.1
5,USA,2016,27,5.4
6,India,2017,33,4.7
7,China,2017,30,5.5
8,USA,2017,24,5.9


In [44]:
d1.gdp.sum()

np.float64(45.6)

In [45]:
d1.count()

country       9
year          9
population    9
gdp           9
dtype: int64

In [46]:
d1.sum()

country       IndiaChinaUSAIndiaChinaUSAIndiaChinaUSA
year                                            18144
population                                        268
gdp                                              45.6
dtype: object

In [47]:
d1.population.sum()

np.int64(268)

In [48]:
d1.max()

country        USA
year          2017
population      35
gdp            6.3
dtype: object

In [49]:
d1

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5
1,China,2015,35,2.3
2,USA,2015,26,6.3
3,India,2016,31,4.9
4,China,2016,32,6.1
5,USA,2016,27,5.4
6,India,2017,33,4.7
7,China,2017,30,5.5
8,USA,2017,24,5.9


In [50]:
d1.gdp.count()

np.int64(9)

In [51]:
d1.head()

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5
1,China,2015,35,2.3
2,USA,2015,26,6.3
3,India,2016,31,4.9
4,China,2016,32,6.1


In [52]:
d1.head(4)

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5
1,China,2015,35,2.3
2,USA,2015,26,6.3
3,India,2016,31,4.9


In [56]:
d1.head(1)

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5


In [58]:
d1.tail()

Unnamed: 0,country,year,population,gdp
4,China,2016,32,6.1
5,USA,2016,27,5.4
6,India,2017,33,4.7
7,China,2017,30,5.5
8,USA,2017,24,5.9


In [59]:
d1.tail(3)

Unnamed: 0,country,year,population,gdp
6,India,2017,33,4.7
7,China,2017,30,5.5
8,USA,2017,24,5.9


In [60]:
d1.loc[5]

country        USA
year          2016
population      27
gdp            5.4
Name: 5, dtype: object

In [61]:
d1

Unnamed: 0,country,year,population,gdp
0,India,2015,30,4.5
1,China,2015,35,2.3
2,USA,2015,26,6.3
3,India,2016,31,4.9
4,China,2016,32,6.1
5,USA,2016,27,5.4
6,India,2017,33,4.7
7,China,2017,30,5.5
8,USA,2017,24,5.9


In [62]:
d1.loc[[5,6,7]]

Unnamed: 0,country,year,population,gdp
5,USA,2016,27,5.4
6,India,2017,33,4.7
7,China,2017,30,5.5


In [67]:
import pandas as pd
data=pd.read_excel("D:\\Github_Projects\\Python-Assignment-Practise\\pandas\\hospital_data_set.xls")

In [68]:
data.head()

Unnamed: 0,id,name,dob,phone,emal,pid,gender,diseases,age
0,11111,bbb1,1950-10-12 00:00:00,1234567890,bbb1@xxx.com,1111111111,M,Diabetes,78
1,11112,bbb2,1984-10-12 00:00:00,1234567890,bbb2@xxx.com,1111111112,F,Cold,67
2,11113,bbb3,712/11/1940,1234567890,bbb3@xxx.com,1111111113,M,Fever,90
3,11114,bbb4,1950-12-12 00:00:00,1234567890,bbb4@xxx.com,1111111114,F,Cold,88
4,11115,bbb5,12/13/1960,1234567890,bbb5@xxx.com,1111111115,M,Blood Pressure,76


In [69]:
print(data[["phone","emal","pid"]].head())

        phone          emal         pid
0  1234567890  bbb1@xxx.com  1111111111
1  1234567890  bbb2@xxx.com  1111111112
2  1234567890  bbb3@xxx.com  1111111113
3  1234567890  bbb4@xxx.com  1111111114
4  1234567890  bbb5@xxx.com  1111111115


In [71]:
d=input("Enter the disease for which you want to check the total no. of patients :")
total=data[data.diseases==d].id.count()
total

np.int64(22773)

In [99]:
for i in data.diseases.unique():
    #print(i," - ",data[data["diseases"]==i].id.count())
    malecount =data[(data["diseases"]==i) & (data["gender"]=="M")].id.count()
    femalecount=data[(data["diseases"]==i) & (data["gender"]=="F")].id.count()
    print(i," :\n\t%-10s%-10d\n\t%-10s%-10d\n"%("Male",malecount,"Female",femalecount),sep="")

Diabetes :
	Male      5603      
	Female    175       

Cold :
	Male      5852      
	Female    16921     

Fever :
	Male      4360      
	Female    8712      

Blood Pressure :
	Male      1096      
	Female    2         

PCOS :
	Male      3817      
	Female    0         

Swine Flu :
	Male      132       
	Female    265       

