# Pandas - DataFrame and Series

- Powerful data manipulation library
- pandas is widely used from data analysis and cleaning
- It provides tow data structures:
- Series (1-dimensional, labeled, array-like Object)
- DataFrame (2-dimensional, labeled, size-mutable,potentially heterogeneous data structure with with labeled axes(rows and columns)) (size mutable)
- It also provides data alignment, data merging, reshaping, and pivoting capabilities.


In [2]:
import pandas as pd

## Pandas Series:
-  Series is a one-dimensional labeled array of values. It is similar to a list, but with additional features like alignment and missing data handling.  It is a powerful data structure for data manipulation and analysis. 
-  Series can be created from a list, dictionary, or other iterable. It can also be  created from a NumPy array or another Series. 
-  Series has several methods for data manipulation, such as filtering, sorting, grouping, and merging . It also has methods for statistical analysis, such as mean, median, and standard deviation.

- similar to a column in a table
- It has indices that start from '0'




#### Creating series from list

In [3]:
import pandas as pd
data = [1,2,3,4,5]
series = pd.Series(data)
print(series)



0    1
1    2
2    3
3    4
4    5
dtype: int64


#### Creating series from a dictionary
- In this the key becomes the index of the series

In [6]:
data = {'a':1,'b':2,'c':3}
series = pd.Series(data)
print(series)

data = [1,2,3,4,5]


a    1
b    2
c    3
dtype: int64


The indices can be specially assigned (custom indexing)

In [8]:
data = [1,2,3,4,5]
indices = ['a','b','c','d','e']
series = pd.Series(data,index=indices)
print(series)

a    1
b    2
c    3
d    4
e    5
dtype: int64


## Pandas Dataframe:
- Has multiple rows and columns
- It is like a dictionary of lists / list of dictionary
- Similar to tables in SQL

In [13]:
data = {
  'Name':['Sangamesh','Sam','Nutan'],
  'Age':[20,21,22],
  'Location':['Bangalore','Mumbai','Delhi']
}
indices = ['a','b','c']
df = pd.DataFrame(data, index = indices)
print(df)

        Name  Age   Location
a  Sangamesh   20  Bangalore
b        Sam   21     Mumbai
c      Nutan   22      Delhi


In [7]:
# list of dictionary
data = [{'name': 'John', 'age': 25},  
             {'name': 'Alice ', 'age': 30}, 
             {'name': 'Bob', 'age': 35}]
print(pd.DataFrame(data)) 


     name  age
0    John   25
1  Alice    30
2     Bob   35


In [4]:
# df = pd.read_csv('sampleData.csv')
# print(df.head(5))

df = pd.read_csv('sampleData.csv')
print(df.head(5))

   Index      Customer Id First Name   Last Name                    Company  \
0      1  ffeCAb7AbcB0f07      Jared      Jarvis           Sanchez-Fletcher   
1      2  b687FfC4F1600eC      Marie      Malone                  Mckay PLC   
2      3  9FF9ACbc69dcF9c     Elijah     Barrera             Marks and Sons   
3      4  b49edDB1295FF6E     Sheryl  Montgomery  Kirby, Vaughn and Sanders   
4      5  3dcCbFEB17CCf2E     Jeremy     Houston             Lester-Manning   

             City                                       Country  \
0   Hatfieldshire                                       Eritrea   
1  Robertsonburgh                                      Botswana   
2         Kimbury                                      Barbados   
3     Briannaview  Antarctica (the territory South of 60 deg S)   
4   South Brianna                                    Micronesia   

                Phone 1               Phone 2  \
0    274.188.8773x41185  001-215-760-4642x969   
1          283-236-9529 

In [5]:
last5 = df.tail(5)
print(last5)

        Index      Customer Id First Name Last Name                Company  \
99995   99996  67F24BEBAa16d1c       Dana   Winters  Pham, Conner and Wade   
99996   99997  17b1dbDaB2ad0fB   Gabriela   Pacheco         Fletcher-Hodge   
99997   99998  c586CFBA6fb9dcC    Mikayla   Hubbard             Austin Ltd   
99998   99999  bb6cb6AC9d0CAf7     Javier      Berg              Welch Inc   
99999  100000  FaE5E3c1Ea0dEc2     Kaylee   Hubbard            Booker-Luna   

                  City       Country                Phone 1  \
99995         Kirkfurt  Sierra Leone           820-930-7616   
99996    Virginiahaven       Comoros        +1-480-464-8646   
99997       Sheriville       Mayotte  +1-567-149-3941x67118   
99998  Stephenschester       Belarus          (381)105-4698   
99999       New Karina       Estonia     (184)132-6303x4566   

                    Phone 2                               Email  \
99995  +1-061-779-5511x3267                 ppittman@watson.com   
99996   (015)822-1

In [7]:
data = {
  'Name':['Sangamesh','Nutan','Sam'],
  'age':[21,22,23],
  'location':['Bangaluru','Mumbai','Hydrabad']
}
df = pd.DataFrame(data)
print(df)

        Name  age   location
0  Sangamesh   21  Bangaluru
1      Nutan   22     Mumbai
2        Sam   23   Hydrabad


#### Accessing data from the dataframe
- To access columns, use the column name ex: `df['Name']`
- To access rows, use the index ex: `df.loc[0]` or `df.iloc[0]`


in pandas, loc and iloc are used for data selection and manipulation, but they operate differently:
##### `loc`
- Label-based: Selects data based on labels or boolean arrays.
- Inclusive: Includes both start and end labels in slicing.
- Syntax: df.loc[row_label, column_label] 
##### `iloc`
- Integer-based: Selects data based on integer positions.
- Exclusive: End index is excluded in slicing.
- Syntax: df.iloc[row_index, column_index]
 
Key Points
Use loc when you need to select data by labels.
Use iloc when you need to select data by integer positions.
This should help you understand when to use each method in your pandas DataFrame operations.



In [10]:
#for working with colums
print(df['Name'])
print(df['location'])

0    Sangamesh
1        Nutan
2          Sam
Name: Name, dtype: object
0    Bangaluru
1       Mumbai
2     Hydrabad
Name: location, dtype: object


In [18]:
# for any row indexing
print(df.loc[0])
print()
print(df.loc[0][2])
print()
print(df.loc[0,'location'])  # for any column indexing


Name        Sangamesh
age                21
location    Bangaluru
Name: 0, dtype: object

Bangaluru

Bangaluru


In [19]:
df.iloc[0]

Name        Sangamesh
age                21
location    Bangaluru
Name: 0, dtype: object

In [22]:
# Accessing a specified element

specific = df.at[0,"location"]
index_specific = df.iat[0,2] #giving row index as well as column index

print(specific)  
print(index_specific)

Bangaluru
Bangaluru


#### Data manipulation using DataFrames

In [17]:
df

Unnamed: 0,Name,age,location
0,Sangamesh,21,Bangaluru
1,Nutan,22,Mumbai
2,Sam,23,Hydrabad


- Adding a new column

In [24]:
# Adding a new column
df['Salary'] = [200000,250000,300000]
print(df)

        Name  age   location  Salary
0  Sangamesh   21  Bangaluru  200000
1      Nutan   22     Mumbai  250000
2        Sam   23   Hydrabad  300000


### Removing a column
- while removing a column using .drop() method, we need to provide 
1) the column name as a string 
2) axis = 1 ( for column, and 0 is for rows )
- This will return an instance of df it is not modified in the original data, new modified data is returned

In [30]:
df.drop('Salary',axis=1)
df.drop(1)
# print(df)

Unnamed: 0,Name,age,location,Salary
0,Sangamesh,21,Bangaluru,200000
2,Sam,23,Hydrabad,300000


```inplace= True``` this is used to update the existing dataframe while ```axis=1``` is used to specify the perticular column

In [31]:
df.drop('Salary',axis=1,inplace=True)

In [32]:
df

Unnamed: 0,Name,age,location
0,Sangamesh,21,Bangaluru
1,Nutan,22,Mumbai
2,Sam,23,Hydrabad


- Now the salary has been removied

- incrementing age to the column

In [51]:
df['age']=df['age']+1
df

Unnamed: 0,Name,age,location
1,Nutan,24,Mumbai
2,Sam,25,Hydrabad


- Droping based on the rows using index

In [None]:
df.drop(0,inplace = True)
df

#### Some important attributes of DataFrames:

In [53]:
df= pd.read_csv('sampleData.csv')
df.head(5)

Unnamed: 0,Index,Customer Id,First Name,Last Name,Company,City,Country,Phone 1,Phone 2,Email,Subscription Date,Website
0,1,ffeCAb7AbcB0f07,Jared,Jarvis,Sanchez-Fletcher,Hatfieldshire,Eritrea,274.188.8773x41185,001-215-760-4642x969,gabriellehartman@benjamin.com,2021-11-11,https://www.mccarthy.info/
1,2,b687FfC4F1600eC,Marie,Malone,Mckay PLC,Robertsonburgh,Botswana,283-236-9529,(189)129-8356x63741,kstafford@sexton.com,2021-05-14,http://www.reynolds.com/
2,3,9FF9ACbc69dcF9c,Elijah,Barrera,Marks and Sons,Kimbury,Barbados,8252703789,459-916-7241x0909,jeanettecross@brown.com,2021-03-17,https://neal.com/
3,4,b49edDB1295FF6E,Sheryl,Montgomery,"Kirby, Vaughn and Sanders",Briannaview,Antarctica (the territory South of 60 deg S),425.475.3586,(392)819-9063,thomassierra@barrett.com,2020-09-23,https://www.powell-bryan.com/
4,5,3dcCbFEB17CCf2E,Jeremy,Houston,Lester-Manning,South Brianna,Micronesia,+1-223-666-5313x4530,252-488-3850x692,rubenwatkins@jacobs-wallace.info,2020-09-18,https://www.carrillo.com/


In [54]:
df.describe()

Unnamed: 0,Index
count,100000.0
mean,50000.5
std,28867.657797
min,1.0
25%,25000.75
50%,50000.5
75%,75000.25
max,100000.0
