## Introduction
**What is Pandas?**

- Pandas is an extremely popular library built upon Numpy, for `handling tabular data, data manipulation and analysis.` 

- Probably the best thing about Pandas is that it stores data as a Python object with rows and columns, very similar to data stored in Excel files. Also, this way we can easily visualize our data, making our job a lot easier then handling data in form of lists or dictionaries.

- It is built on the Numpy package and its key data structure is called the **`DataFrame`** also the advantage over` Numpy is that it handles multiple data types (i.e., strings), not only numerical data, although this makes it slower in comparison to Numpy.


**How to install Pandas?**

- if using Anaconda, Pandas is preinstalled in the base environment.

- if you want to install pandas, the syntax would be as following.

```conda install pandas   
pip install pandas   
pip install pandas```

**How to import Pandas?**

In [3]:
# import pandas, pd is an alias
import pandas as pd

In [4]:
#Checking the version of Pandas
print(pd.__version__)

1.4.4


- Main data structures in Pandas
    -   **Data Series** - like a `column in a table`. It is a
        one-dimensional array holding data of any type.
    -   **Data Frame** - tabular data with multiple rows and columns.

## Series
- A series is similar to a 1-D numpy array, and contains scalar values of the same type (numeric, character, datetime etc.). 
A dataframe is simply a table where each column is a pandas series.

### Creating Pandas Series

- Series are one-dimensional array-like structures, though unlike numpy arrays, they often contain non-numeric data (characters, dates, time, booleans etc.)

- You can create pandas series from array-like objects using ```pd.Series()```.

- `We use the pd.Series() command, and provide a list,numpy,dictionary`

**Creating series from a list**

In [6]:
list1=[100,200,300,400,500]

In [7]:
s1 = pd.Series(list1)

In [8]:
s1

0    100
1    200
2    300
3    400
4    500
dtype: int64

In [9]:
type(s1)

pandas.core.series.Series

**Creating series from numpy array**

In [11]:
#import numpy, np is an alias
import numpy as np

In [12]:
# Creating a numpy array
arr= np.random.randint(10,40,(8,))
print(arr)
print(type(arr))

[16 36 27 28 20 19 17 20]
<class 'numpy.ndarray'>


In [13]:
s2=pd.Series(arr)
print(s2)
print(type(s2))

0    16
1    36
2    27
3    28
4    20
5    19
6    17
7    20
dtype: int32
<class 'pandas.core.series.Series'>


#### Explicitly specifying indices

You might have noticed that while creating a series, Pandas automatically indexes it from 0 to (n-1), n being the number of rows. But if we want, we can also explicitly set the index ourselves, using the ‘index’ argument while creating the series using `pd.Series()`

**Creating a Series with the index**

In [16]:
s1 = pd.Series(list1,index=['a','b','c','d','e'])   
# here length of the values should match the length of the index

In [17]:
s1

a    100
b    200
c    300
d    400
e    500
dtype: int64

**Creating a series of characters**
- Notice that the 'dtype' of characters is 'object'

In [18]:
# creating a series of characters
# notice that the 'dtype' here is 'object'
char_series = pd.Series(['a', 'b', 'af'])
char_series

0     a
1     b
2    af
dtype: object

**Creating a series with a dictionary**

In [19]:
# using dictionary

dic1 = {
    
        "A" : "APPLE",
        "B" : "BAT",
        "C" : "CAR",
        "D" : "DOG"
     
}

In [20]:
ds1 = pd.Series(dic1)
print(ds1)
print(type(ds1))

A    APPLE
B      BAT
C      CAR
D      DOG
dtype: object
<class 'pandas.core.series.Series'>


#### Indexing Series

Indexing series is exactly same as 1-D numpy arrays - index starts at 0.

In [24]:
s = pd.Series([10,20,30,46,7])   #Creating a Series
s

0    10
1    20
2    30
3    46
4     7
dtype: int64

In [25]:
# Indexing pandas series: Same as indexing 1-d numpy arrays or lists
# accessing the fourth element
s[3]

46

In [26]:
# accessing elements starting index = 2 till the end
s[2:]

2    30
3    46
4     7
dtype: int64

In [27]:
# accessing the second and the fourth elements
# note that s[1, 3] will not work, you need to pass the indices [1, 3] as a list inside the original []
s[[1, 3]]

1    20
3    46
dtype: int64

In [28]:
ds1

A    APPLE
B      BAT
C      CAR
D      DOG
dtype: object

In [29]:
ds1["A"]

'APPLE'

In [30]:
ds1[0]

'APPLE'

**Slicing Series**

In [31]:
s

0    10
1    20
2    30
3    46
4     7
dtype: int64

In [32]:
s[1 : 4]   #slicing the rows from 1 to 3

1    20
2    30
3    46
dtype: int64

In [33]:
s[1 : 4 : 2]   #slicing using step 2

1    20
3    46
dtype: int64

In [34]:
ds1

A    APPLE
B      BAT
C      CAR
D      DOG
dtype: object

In [36]:
ds1["B":"D"]    #slicing using character indexings from "B" to "D"

B    BAT
C    CAR
D    DOG
dtype: object

**Adding two series**

In [38]:
sa = pd.Series([10,20,30,40,50])
sb = pd.Series([1,2,3,4,5])            #Creating two series

In [39]:
sa

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [40]:
sb

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [41]:
sa+ sb        #adding two series

0    11
1    22
2    33
3    44
4    55
dtype: int64

------------------------------------------------------------------------------------------------------

## Dataframe

- Dataframe is the most widely used data-structure in data analysis. 

- `It is a table with rows and columns, with rows having an index and columns having meaningful names.`

**Syntax to create dataframe**

> df = pd.DataFrame(`list,Series,array,dictionary`)

**Creating a DataFrame using list**

In [51]:
list1=[10,20,30,40,50]
print(list1)
print(type(list1))               #Creating a List

[10, 20, 30, 40, 50]
<class 'list'>


In [43]:
df=pd.DataFrame(list1)
df

Unnamed: 0,0
0,10
1,20
2,30
3,40
4,50


In [47]:
df=pd.DataFrame(list1,columns=["Column1"])    #Assigning column name-- "Column1"
df

Unnamed: 0,Column1
0,10
1,20
2,30
3,40
4,50


In [48]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


**Creating a DataFrame using Series**

In [50]:
s1 = pd.Series([10,20,30,40,50])
print(s1)                                #Creating a Series
type(s1)

0    10
1    20
2    30
3    40
4    50
dtype: int64


pandas.core.series.Series

In [52]:
df1=pd.DataFrame(s1)

In [53]:
df1

Unnamed: 0,0
0,10
1,20
2,30
3,40
4,50


In [54]:
type(df1)

pandas.core.frame.DataFrame

**Creating a DataFrame using Dictionaries**

In [55]:
dic1 = {
    
    "Name" : ["A","B","C","D"],
    "Designation" : ["PM","Dev","TM","Tester"],
    'Location' : ['Delhi',"Hyderabad", "Chennai", "Vizag"],        #Creating a Dictionary
    'Salary' : [100000,80000,90000,60000]
    
}

In [56]:
df2=pd.DataFrame(dic1)

In [57]:
df2

Unnamed: 0,Name,Designation,Location,Salary
0,A,PM,Delhi,100000
1,B,Dev,Hyderabad,80000
2,C,TM,Chennai,90000
3,D,Tester,Vizag,60000


**Creating indexed dataframe**

In [58]:
df3=pd.DataFrame(dic1,index = ['person1', 'person2', 'person3', 'person4'])    #Passing indexes

In [59]:
df3

Unnamed: 0,Name,Designation,Location,Salary
person1,A,PM,Delhi,100000
person2,B,Dev,Hyderabad,80000
person3,C,TM,Chennai,90000
person4,D,Tester,Vizag,60000


**Creating a DataFrame with NaN values**   

- NaN value------ Null value `syntax for NaN -- np.nan`

In [61]:
dic1 = {
    
    "Name" : ["A",np.nan,"C","D"],
    "Designation" : ["PM",np.nan,"TM","Tester"],                #Creating a Dictionary
    'Location' : ['Delhi',"Hyderabad", "Chennai", "Vizag"],
    'Salary' : [100000,80000,np.nan,60000]
    
}

In [62]:
pd.DataFrame(dic1)

Unnamed: 0,Name,Designation,Location,Salary
0,A,PM,Delhi,100000.0
1,,,Hyderabad,80000.0
2,C,TM,Chennai,
3,D,Tester,Vizag,60000.0


#### Acessing columns values in DataFrame

In [68]:
df3

Unnamed: 0,Name,Designation,Location,Salary
person1,A,PM,Delhi,100000
person2,B,Dev,Hyderabad,80000
person3,C,TM,Chennai,90000
person4,D,Tester,Vizag,60000


In [69]:
df3["Name"]    #acessing "[Name]" column 

person1    A
person2    B
person3    C
person4    D
Name: Name, dtype: object

In [64]:
type(df3["Name"])    #------above syntax will give output in Series

pandas.core.series.Series

In [71]:
df3[["Name"]]       ##acessing "[[Name]]" column 

Unnamed: 0,Name
person1,A
person2,B
person3,C
person4,D


In [66]:
type(df3[["Name"]])   #------above syntax will give output in DataFrame

pandas.core.frame.DataFrame

In [67]:
df3[["Name","Salary"]]   #acessing multipe columns in a DataFrame

Unnamed: 0,Name,Salary
person1,A,100000
person2,B,80000
person3,C,90000
person4,D,60000


**Creating a DataFrame using numpy array**

In [73]:
ary1 = np.random.randint(100,200,(4,5))      #Creating a array using numpy
print(ary1)
print(type(ary1))

[[177 150 159 115 130]
 [135 191 119 146 108]
 [130 137 186 108 191]
 [198 175 139 114 123]]
<class 'numpy.ndarray'>


In [74]:
dfnp = pd.DataFrame(ary1)        #creating DataFrame using array

In [75]:
dfnp

Unnamed: 0,0,1,2,3,4
0,177,150,159,115,130
1,135,191,119,146,108
2,130,137,186,108,191
3,198,175,139,114,123


In [76]:
dfnp = pd.DataFrame(ary1,columns=["A","B","C","D","E"],index=["a","b","c","d"])  
#Creating a DataFrame using same array but now assinging columns and indexing

In [77]:
dfnp

Unnamed: 0,A,B,C,D,E
a,177,150,159,115,130
b,135,191,119,146,108
c,130,137,186,108,191
d,198,175,139,114,123


In [78]:
dfnp["A"]["a"]    #acessing by column and index

177

In [79]:
dfnp["D"]["d"]

114

**Creating a DataFrame using Tuple**

In [80]:
data = [('1/1/2019', 13, 6, 'Rain'),
       ('2/1/2019', 11, 7, 'Fog'),
       ('3/1/2019', 12, 8, 'Sunny'),
       ('4/1/2019', 8, 5, 'Snow'),           #Creating a dataframeusing a list with tuples 
       ('5/1/2019', 9, 6 , 'Rain')]

df = pd.DataFrame(data)
df

Unnamed: 0,0,1,2,3
0,1/1/2019,13,6,Rain
1,2/1/2019,11,7,Fog
2,3/1/2019,12,8,Sunny
3,4/1/2019,8,5,Snow
4,5/1/2019,9,6,Rain


In [82]:
df=pd.DataFrame(data,columns=["Date","Temperature","Windspeed","Event"])     #assigning column names
df

Unnamed: 0,Date,Temperature,Windspeed,Event
0,1/1/2019,13,6,Rain
1,2/1/2019,11,7,Fog
2,3/1/2019,12,8,Sunny
3,4/1/2019,8,5,Snow
4,5/1/2019,9,6,Rain


**Creating a dataframe using a dictionary of series**

In [83]:
dic = {'Name' : pd.Series(['Tom','Jack','Steve','Ricky','Vin', 'James', 'Vin']),
       'Age' : pd.Series([25, 26, 25, 35, 23, 33, 31]),
       'Rating' : pd.Series([4.23, 4.1, 3.4, 5, 2.9, 4.7,3.1])}

df = pd.DataFrame(dic)               #Creating a DataFrame using dictionary of Series
df 

Unnamed: 0,Name,Age,Rating
0,Tom,25,4.23
1,Jack,26,4.1
2,Steve,25,3.4
3,Ricky,35,5.0
4,Vin,23,2.9
5,James,33,4.7
6,Vin,31,3.1
