# 4.Working with Columns in Pandas.

In [1]:
import pandas as pd

#### Let's take new sample csv dataset...

In [2]:
df = pd.read_csv('Datasets/course.csv')
df.head()

Unnamed: 0,1,Mumbai,PQR institute .Pvt,BI,MH,9 Months
0,2,Delhi,ABC institute .Pvt.LTD,AI,UP,18 Months
1,3,Bengaluru,MNO institute .Ltd,Data Science,TN,11 Months
2,4,Bhopal,RST institute .Pvt.LTD,ML,MP,3 Months
3,5,Mumbai,EFG institute .Pvt.LTD,DL,MH,3 Months
4,6,Hyderabad,LMN institute,Could,AP,4 Months


#### As we can see there is No Header or Column name here, So let's begin with adding a header or columns name as we did in the previous chapter.

## a.Adding Column Name/Header

In [3]:
header = ['Sr no','City','Coaching Center','Courses','State','Course Duration']

df = pd.read_csv('Datasets/course.csv',header=None,names=header)
df.head()

Unnamed: 0,Sr no,City,Coaching Center,Courses,State,Course Duration
0,1,Mumbai,PQR institute .Pvt,BI,MH,9 Months
1,2,Delhi,ABC institute .Pvt.LTD,AI,UP,18 Months
2,3,Bengaluru,MNO institute .Ltd,Data Science,TN,11 Months
3,4,Bhopal,RST institute .Pvt.LTD,ML,MP,3 Months
4,5,Mumbai,EFG institute .Pvt.LTD,DL,MH,3 Months


## b.Selecting a Column

##### There are two different ways, from which we can select a column i.e
i. Dot Notation
ii. Bracket Notation

### i. Dot Notation

##### Syntax: DataFrame.Column_name

In [4]:
df.City.head()        #.head() is just extract 1st five rows of column

0       Mumbai
1        Delhi
2    Bengaluru
3       Bhopal
4       Mumbai
Name: City, dtype: object

#### Make sure that you write the exact same column name as the column name is case-sensitive.

In [5]:
#for example/
df.city.head()

AttributeError: 'DataFrame' object has no attribute 'city'

#### As you can see, if we write small c in City it gives an error.

### However dot notation will not work if column name is seperated by space. In that case we have to use Bracket Notation

In [6]:
#for example/
df.Sr no

SyntaxError: invalid syntax (<ipython-input-6-4fc0569430b9>, line 2)

In [7]:
df.Coaching Center

SyntaxError: invalid syntax (<ipython-input-7-850d4a869666>, line 1)

### ii.Bracket Notation

##### Syntax: DataFrame['column_name']

In [8]:
df['Sr no'].head()

0    1
1    2
2    3
3    4
4    5
Name: Sr no, dtype: int64

In [9]:
df['Coaching Center'].head()

0         PQR institute .Pvt
1     ABC institute .Pvt.LTD
2        MNO  institute .Ltd
3    RST  institute .Pvt.LTD
4    EFG  institute .Pvt.LTD
Name: Coaching Center, dtype: object

### Using Bracket notation should be an ideal way of selecting or working with columns.

## c.Adding new Column in DataFrame

#### Let's first have a look over out dataframe

In [10]:
df.head()

Unnamed: 0,Sr no,City,Coaching Center,Courses,State,Course Duration
0,1,Mumbai,PQR institute .Pvt,BI,MH,9 Months
1,2,Delhi,ABC institute .Pvt.LTD,AI,UP,18 Months
2,3,Bengaluru,MNO institute .Ltd,Data Science,TN,11 Months
3,4,Bhopal,RST institute .Pvt.LTD,ML,MP,3 Months
4,5,Mumbai,EFG institute .Pvt.LTD,DL,MH,3 Months


#### In our dataframe, there are two columns named City and State. Now our object is to first join this City and State columns and then Add it in the DataFrame as Location.

#### Joining two different column is as same as string concatenation i.e

In [11]:
df['City'] +', ' +df['State']

0        Mumbai, MH
1         Delhi, UP
2     Bengaluru, TN
3        Bhopal, MP
4        Mumbai, MH
5     Hyderabad, AP
6         Patna, BH
7       Channai, KT
8       Kolkata, WB
9     Hyderabad, AP
10       Bhopal, MP
11         Pune, MH
12        Delhi, UP
13       Mumbai, MH
14        Patna, BH
15        Surat, GJ
16        Delhi, UP
17       Mumbai, MH
18         Pune, MH
19    Bengaluru, TN
20        Surat, Gj
21    Hyderabad, AP
22       Mumbai, MH
23      Channai, KT
24        Patna, BH
dtype: object

#### so let's store this in a variable called Location,

In [12]:
Location = df['City']+', '+df['State']
Location

0        Mumbai, MH
1         Delhi, UP
2     Bengaluru, TN
3        Bhopal, MP
4        Mumbai, MH
5     Hyderabad, AP
6         Patna, BH
7       Channai, KT
8       Kolkata, WB
9     Hyderabad, AP
10       Bhopal, MP
11         Pune, MH
12        Delhi, UP
13       Mumbai, MH
14        Patna, BH
15        Surat, GJ
16        Delhi, UP
17       Mumbai, MH
18         Pune, MH
19    Bengaluru, TN
20        Surat, Gj
21    Hyderabad, AP
22       Mumbai, MH
23      Channai, KT
24        Patna, BH
dtype: object

#### Now let's add Location into our DataFrame.     Note: For assigning new columns into dataframe alwasy use bracket notation as the dot notation will not work here.

In [13]:
df['Location'] = Location

In [14]:
df

Unnamed: 0,Sr no,City,Coaching Center,Courses,State,Course Duration,Location
0,1,Mumbai,PQR institute .Pvt,BI,MH,9 Months,"Mumbai, MH"
1,2,Delhi,ABC institute .Pvt.LTD,AI,UP,18 Months,"Delhi, UP"
2,3,Bengaluru,MNO institute .Ltd,Data Science,TN,11 Months,"Bengaluru, TN"
3,4,Bhopal,RST institute .Pvt.LTD,ML,MP,3 Months,"Bhopal, MP"
4,5,Mumbai,EFG institute .Pvt.LTD,DL,MH,3 Months,"Mumbai, MH"
5,6,Hyderabad,LMN institute,Could,AP,4 Months,"Hyderabad, AP"
6,7,Patna,DEF institute .Ltd,Web Serv,BH,4 Months,"Patna, BH"
7,8,Channai,PQR institute .Pvt.LTD,AWS,KT,3 Months,"Channai, KT"
8,9,Kolkata,UVW institute .Ltd,Networking,WB,3 Months,"Kolkata, WB"
9,10,Hyderabad,GHI institute .Ltd,AI,AP,18 Months,"Hyderabad, AP"


#### Our new column Location is added into the dataframe. here but it wont reflect into out original dataset. If you again import the same dataset, you will find there is no change made into it.

## d.Deleting column in Pandas DataFrame

### syntax : DataFrame.drop(column_name,axis=,inplace=)

- **column_name** - Here you can pass the single columns name or List of columns that you want to drop.
- **axis** - you can pass axis=1 for columns or axis ='columns'
- **inplace** - take boolen values True or False and is used for make change in existing dataframe

In [15]:
df.drop('City',axis='columns').head()                         
#if you dont use inplace parameter, the new change will not take place.

Unnamed: 0,Sr no,Coaching Center,Courses,State,Course Duration,Location
0,1,PQR institute .Pvt,BI,MH,9 Months,"Mumbai, MH"
1,2,ABC institute .Pvt.LTD,AI,UP,18 Months,"Delhi, UP"
2,3,MNO institute .Ltd,Data Science,TN,11 Months,"Bengaluru, TN"
3,4,RST institute .Pvt.LTD,ML,MP,3 Months,"Bhopal, MP"
4,5,EFG institute .Pvt.LTD,DL,MH,3 Months,"Mumbai, MH"


In [16]:
df.head()

Unnamed: 0,Sr no,City,Coaching Center,Courses,State,Course Duration,Location
0,1,Mumbai,PQR institute .Pvt,BI,MH,9 Months,"Mumbai, MH"
1,2,Delhi,ABC institute .Pvt.LTD,AI,UP,18 Months,"Delhi, UP"
2,3,Bengaluru,MNO institute .Ltd,Data Science,TN,11 Months,"Bengaluru, TN"
3,4,Bhopal,RST institute .Pvt.LTD,ML,MP,3 Months,"Bhopal, MP"
4,5,Mumbai,EFG institute .Pvt.LTD,DL,MH,3 Months,"Mumbai, MH"


In [17]:
#Deleting Single column
df.drop('City',axis='columns',inplace=True)
df.head()      

Unnamed: 0,Sr no,Coaching Center,Courses,State,Course Duration,Location
0,1,PQR institute .Pvt,BI,MH,9 Months,"Mumbai, MH"
1,2,ABC institute .Pvt.LTD,AI,UP,18 Months,"Delhi, UP"
2,3,MNO institute .Ltd,Data Science,TN,11 Months,"Bengaluru, TN"
3,4,RST institute .Pvt.LTD,ML,MP,3 Months,"Bhopal, MP"
4,5,EFG institute .Pvt.LTD,DL,MH,3 Months,"Mumbai, MH"


In [18]:
#Deleting Multiple columns.
df.drop(['State','Course Duration'],axis='columns',inplace=True) # you can pass as many columns you want in the list
df.head()      

Unnamed: 0,Sr no,Coaching Center,Courses,Location
0,1,PQR institute .Pvt,BI,"Mumbai, MH"
1,2,ABC institute .Pvt.LTD,AI,"Delhi, UP"
2,3,MNO institute .Ltd,Data Science,"Bengaluru, TN"
3,4,RST institute .Pvt.LTD,ML,"Bhopal, MP"
4,5,EFG institute .Pvt.LTD,DL,"Mumbai, MH"


#### But remember these changes we are making on our dataframe, your orignal dataset will get affected by any change we make here.

#### Let's once again import our dataset

In [57]:
df = pd.read_csv('Datasets/course.csv')
df.head()

Unnamed: 0,1,Mumbai,PQR institute .Pvt,BI,MH,9 Months
0,2,Delhi,ABC institute .Pvt.LTD,AI,UP,18 Months
1,3,Bengaluru,MNO institute .Ltd,Data Science,TN,11 Months
2,4,Bhopal,RST institute .Pvt.LTD,ML,MP,3 Months
3,5,Mumbai,EFG institute .Pvt.LTD,DL,MH,3 Months
4,6,Hyderabad,LMN institute,Could,AP,4 Months


#### As you can see, there is no change made here, there is still no header we had created, no location column that we added in our dataframe, also we can see the city,state and course duration column which we had just deleted

## e.Seting new Index

**Syntax**:  DataFrame.set_index('new_index',inplace=True)

#### Before that let's first assign our header,

In [67]:
header = ['Sr no','City','Coaching Center','Courses','State','Course Duration']
df = pd.read_csv('Datasets/course.csv',header=None,names=header)
df.head()

Unnamed: 0,Sr no,City,Coaching Center,Courses,State,Course Duration
0,1,Mumbai,PQR institute .Pvt,BI,MH,9 Months
1,2,Delhi,ABC institute .Pvt.LTD,AI,UP,18 Months
2,3,Bengaluru,MNO institute .Ltd,Data Science,TN,11 Months
3,4,Bhopal,RST institute .Pvt.LTD,ML,MP,3 Months
4,5,Mumbai,EFG institute .Pvt.LTD,DL,MH,3 Months


### Let's make courses column our new index.

In [68]:
df.set_index('Courses',inplace=True)

In [69]:
df.head()

Unnamed: 0_level_0,Sr no,City,Coaching Center,State,Course Duration
Courses,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BI,1,Mumbai,PQR institute .Pvt,MH,9 Months
AI,2,Delhi,ABC institute .Pvt.LTD,UP,18 Months
Data Science,3,Bengaluru,MNO institute .Ltd,TN,11 Months
ML,4,Bhopal,RST institute .Pvt.LTD,MP,3 Months
DL,5,Mumbai,EFG institute .Pvt.LTD,MH,3 Months


#### Now courses column become our new index.

## Reseting the index
**Syntax**:
- DataFrame.index.name = 'current index name'
- DataFrame.reset_index(inplace=True)

In [70]:
df.index.name='Courses'
df.reset_index(inplace=True)
df.head()

Unnamed: 0,Courses,Sr no,City,Coaching Center,State,Course Duration
0,BI,1,Mumbai,PQR institute .Pvt,MH,9 Months
1,AI,2,Delhi,ABC institute .Pvt.LTD,UP,18 Months
2,Data Science,3,Bengaluru,MNO institute .Ltd,TN,11 Months
3,ML,4,Bhopal,RST institute .Pvt.LTD,MP,3 Months
4,DL,5,Mumbai,EFG institute .Pvt.LTD,MH,3 Months


### This will change the index back to normal but, set previous index as first column

### One More thing although we are dealing with DataFrame here, but we should keep this thing in our mind that, every single row in a dataframe and every single column in a dataframe is just Series.

so, if we can select City column from a datafram df, that's just mean 
select a series named city from the datafram df.

In [72]:
type(df)  # df variable is for sure a DataFrame data-structure but,

pandas.core.frame.DataFrame

In [73]:
type(df.City) #df.City or the City column is a Series data-Structure

pandas.core.series.Series