# Pandas - DataFrame and Series

- Powerful data manipulation library
- pandas is widely used from data analysis and cleaning
- It provides tow data structures:
- Series (1-dimensional labeled array of values)
-  DataFrame (2-dimensional labeled data structure with columns of potentially different types) (size mutable)
- It also provides data alignment, data merging, reshaping, and pivoting capabilities.


In [2]:
import pandas as pd

## Pandas Series:
-  Series is a one-dimensional labeled array of values. It is similar to a list, but with additional features like alignment and missing data handling.  It is a powerful data structure for data manipulation and analysis. 
-  Series can be created from a list, dictionary, or other iterable. It can also be  created from a NumPy array or another Series. 
-  Series has several methods for data manipulation, such as filtering, sorting, grouping, and merging . It also has methods for statistical analysis, such as mean, median, and standard deviation.

- similar to a column in a table






#### Creating series from list

In [3]:
import pandas as pd
data = [1,2,3,4,5]
series = pd.Series(data)
print(series)

0    1
1    2
2    3
3    4
4    5
dtype: int64


#### Creating series from a dictionary
- In this the key becomes the index of the series

In [4]:
data = {'a':1,'b':2,'c':3}
series = pd.Series(data)
print(series)

a    1
b    2
c    3
dtype: int64


The indices can be specially assigned (custom indexing)

In [5]:
data = [1,2,3,4,5]
index = ['a','b','c','z','y']
series = pd.Series(data,index = index)
print(series)

a    1
b    2
c    3
z    4
y    5
dtype: int64


## Pandas Dataframe:
- Has multiple rows and columns

In [6]:
#dictionary of list

data = {
  'Name':['Sangamesh','Nutan','Sam'],
  'age':[21,22,23],
  'location':['Bangaluru','Mumbai','Hydrabad']
}
df = pd.DataFrame(data)
print(df)

        Name  age   location
0  Sangamesh   21  Bangaluru
1      Nutan   22     Mumbai
2        Sam   23   Hydrabad


In [7]:
# list of dictionary
data = [{'name': 'John', 'age': 25}, 
             {'name': 'Alice ', 'age': 30}, 
             {'name': 'Bob', 'age': 35}]
print(pd.DataFrame(data)) 


     name  age
0    John   25
1  Alice    30
2     Bob   35


In [8]:
df = pd.read_csv('sampleData.csv')
print(df.head(5))

   Index      Customer Id First Name   Last Name                    Company  \
0      1  ffeCAb7AbcB0f07      Jared      Jarvis           Sanchez-Fletcher   
1      2  b687FfC4F1600eC      Marie      Malone                  Mckay PLC   
2      3  9FF9ACbc69dcF9c     Elijah     Barrera             Marks and Sons   
3      4  b49edDB1295FF6E     Sheryl  Montgomery  Kirby, Vaughn and Sanders   
4      5  3dcCbFEB17CCf2E     Jeremy     Houston             Lester-Manning   

             City                                       Country  \
0   Hatfieldshire                                       Eritrea   
1  Robertsonburgh                                      Botswana   
2         Kimbury                                      Barbados   
3     Briannaview  Antarctica (the territory South of 60 deg S)   
4   South Brianna                                    Micronesia   

                Phone 1               Phone 2  \
0    274.188.8773x41185  001-215-760-4642x969   
1          283-236-9529 

In [9]:
data = {
  'Name':['Sangamesh','Nutan','Sam'],
  'age':[21,22,23],
  'location':['Bangaluru','Mumbai','Hydrabad']
}
df = pd.DataFrame(data)
print(df)

        Name  age   location
0  Sangamesh   21  Bangaluru
1      Nutan   22     Mumbai
2        Sam   23   Hydrabad


In [10]:
#for working with colums
df['Name']

0    Sangamesh
1        Nutan
2          Sam
Name: Name, dtype: object

In [11]:
# for any row indexing
df.loc[0]

Name        Sangamesh
age                21
location    Bangaluru
Name: 0, dtype: object

In [12]:

df.iloc[0]
df.iloc[0][2]

  df.iloc[0][2]


'Bangaluru'

In [16]:
# Accessing a specified element
print(df.at[2,'location'])
print(df.iat[2,2])  #giving row index as well as column index

Hydrabad
Hydrabad


#### Data manipulation using DataFrames

In [17]:
df

Unnamed: 0,Name,age,location
0,Sangamesh,21,Bangaluru
1,Nutan,22,Mumbai
2,Sam,23,Hydrabad


In [42]:
# Adding a new column
df['Salary']= [50000,60000,70000]
df

Unnamed: 0,Name,age,location,Salary
0,Sangamesh,21,Bangaluru,50000
1,Nutan,22,Mumbai,60000
2,Sam,23,Hydrabad,70000


In [43]:
# Deleting a column
df.drop('Salary',axis=1,inplace=True)
df


Unnamed: 0,Name,age,location
0,Sangamesh,21,Bangaluru
1,Nutan,22,Mumbai
2,Sam,23,Hydrabad


```inplace= True``` this is used to update the existing dataframe while ```axis=1``` is used to specify the perticular column

- incrementing age to the column

In [51]:
df['age']=df['age']+1
df

Unnamed: 0,Name,age,location
1,Nutan,24,Mumbai
2,Sam,25,Hydrabad


- Droping based on the rows using index

In [None]:
df.drop(0,inplace = True)
df

#### Some important attributes of DataFrames:

In [53]:
df= pd.read_csv('sampleData.csv')
df.head(5)

Unnamed: 0,Index,Customer Id,First Name,Last Name,Company,City,Country,Phone 1,Phone 2,Email,Subscription Date,Website
0,1,ffeCAb7AbcB0f07,Jared,Jarvis,Sanchez-Fletcher,Hatfieldshire,Eritrea,274.188.8773x41185,001-215-760-4642x969,gabriellehartman@benjamin.com,2021-11-11,https://www.mccarthy.info/
1,2,b687FfC4F1600eC,Marie,Malone,Mckay PLC,Robertsonburgh,Botswana,283-236-9529,(189)129-8356x63741,kstafford@sexton.com,2021-05-14,http://www.reynolds.com/
2,3,9FF9ACbc69dcF9c,Elijah,Barrera,Marks and Sons,Kimbury,Barbados,8252703789,459-916-7241x0909,jeanettecross@brown.com,2021-03-17,https://neal.com/
3,4,b49edDB1295FF6E,Sheryl,Montgomery,"Kirby, Vaughn and Sanders",Briannaview,Antarctica (the territory South of 60 deg S),425.475.3586,(392)819-9063,thomassierra@barrett.com,2020-09-23,https://www.powell-bryan.com/
4,5,3dcCbFEB17CCf2E,Jeremy,Houston,Lester-Manning,South Brianna,Micronesia,+1-223-666-5313x4530,252-488-3850x692,rubenwatkins@jacobs-wallace.info,2020-09-18,https://www.carrillo.com/


In [54]:
df.describe()

Unnamed: 0,Index
count,100000.0
mean,50000.5
std,28867.657797
min,1.0
25%,25000.75
50%,50000.5
75%,75000.25
max,100000.0
