## Outline


  - **Installation of pandas** 
    - Importing pandas
    - Importing the dataset 
    - Dataframe/Series
  
  - **Basic ops on a DataFrame** 
    - df.info()
    - df.head()
    - df.tail()
    - df.shape()
  - **Creating Dataframe from Scratch**
  - **Basic ops on columns** 
    - Different ways of accessing cols
    - Check for Unique values
    - Rename column
    - Deleting col
    - Creating new cols
  


## Importing Pandas 

- You should be able to import Pandas after installing it


- We'll import `pandas` as its **alias name `pd`**

In [None]:
import pandas as pd
import numpy as np

## Introduction: Why to use Pandas? 


#### How is it different from numpy ?

  - The major **limitation of numpy** is that it can only work with 1 datatype at a time

  - Most real-world datasets contain a mixture of different datatypes 
    - Like **names of places would be string** but their **population would be int**
  
==> It is **difficult to work** with data having **heterogeneous values using Numpy**

#### Pandas can work with numbers and strings together

<!-- - If our **data has only numbers**, we are better off using **Numpy** 
  - It's **lighter** and **easier**

- But if our data has both **number and non-number vals**, it makes sense to use **Pandas** -->

So lets see how we can use pandas



## Imagine that you are a Data Scientist with McKinsey



  - McKinsey wants to understand the relation between **GDP per capita** and **life expectancy** and various trends for their clients.
  - The company has acquired **data from multiple surveys** in different countries in the past
  - This contains info of several years about:
   - country
   - population size
   - life expectancy
   - GDP per Capita
  - We have to analyse the data and draw **inferences** meaningful to the company

## Reading dataset in Pandas

Link:https://drive.google.com/file/d/1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_/view?usp=sharing

In [None]:
!wget "https://drive.google.com/uc?export=download&id=1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_" -O gapminder.csv

--2023-01-12 15:07:55--  https://drive.google.com/uc?export=download&id=1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_
Resolving drive.google.com (drive.google.com)... 172.253.119.113, 172.253.119.138, 172.253.119.101, ...
Connecting to drive.google.com (drive.google.com)|172.253.119.113|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-0s-68-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/6o56ndbb2bl2obfskkh8tc3p1fno8cub/1673536050000/14302370361230157278/*/1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_?e=download&uuid=6ab0c131-4737-4f90-9737-67cef1438e0c [following]
--2023-01-12 15:07:55--  https://doc-0s-68-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/6o56ndbb2bl2obfskkh8tc3p1fno8cub/1673536050000/14302370361230157278/*/1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_?e=download&uuid=6ab0c131-4737-4f90-9737-67cef1438e0c
Resolving doc-0s-68-docs.googleusercontent.com (doc-0s-68-docs.googleusercontent.com)... 142.251.161.132, 

#### Now how should we read this dataset?

Pandas makes it very easy to work with these kinds of files



In [None]:
df = pd.read_csv('gapminder.csv') # We are storing the data in df
df

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


## Dataframe and Series


#### What can we observe from the above dataset ?

We can see that it has:
  - 6 columns
  - 1704 rows 

#### What do you think is the datatype of `df` ?

In [None]:
type(df)

pandas.core.frame.DataFrame

Its a **pandas DataFrame**

### What is a pandas DataFrame ?

  - It is a table-like representation of data in Pandas => Structured Data 
  - **Structured Data** here can be thought of as **tabular data in a proper order**
  - Considered as **counterpart of 2D-Matrix** in Numpy

#### Now how can we access a column, say `country` of the dataframe?

In [None]:
df["country"]

0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: country, Length: 1704, dtype: object

As you can see we get all the values in the column **country**

#### Now what is the data-type of a column?

In [None]:
type(df["country"])

pandas.core.series.Series

Its a **pandas Series**

### What is a pandas Series ?
  - **Series** in Pandas is what a **Vector** is in Numpy

#### What exactly does that mean?

  - It means a Series is a **single column** of **data**

  - **Multiple Series stack together to form a DataFrame**
  

Now we have understood what Series and DataFrames are 

#### What if a dataset has 100 rows ... Or 100 columns ?

#### How can we find the datatype, name, total entries in each column ?


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     1704 non-null   object 
 1   year        1704 non-null   int64  
 2   population  1704 non-null   int64  
 3   continent   1704 non-null   object 
 4   life_exp    1704 non-null   float64
 5   gdp_cap     1704 non-null   float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB


df.info() gives a **list of columns** with:

- **Name/Title** of Columns
- **How many non-null values (blank cells)** each column has
- **Type of values** in each column - int, float, etc.

**By default**, it shows **data-type as `object` for anything other than int or float** - Will come back later

#### Now what if we want to see the first few rows in the dataset ? 



In [None]:
df.head()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


It **Prints top 5 rows by default**

We can also **pass in number of rows we want to see** in `head()`



In [None]:
df.head(20)

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
5,Afghanistan,1977,14880372,Asia,38.438,786.11336
6,Afghanistan,1982,12881816,Asia,39.854,978.011439
7,Afghanistan,1987,13867957,Asia,40.822,852.395945
8,Afghanistan,1992,16317921,Asia,41.674,649.341395
9,Afghanistan,1997,22227415,Asia,41.763,635.341351


#### Similarly what if we want to see the last 20 rows ?

In [None]:
df.tail(20) #Similar to head

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
1684,Zambia,1972,4506497,Africa,50.107,1773.498265
1685,Zambia,1977,5216550,Africa,51.386,1588.688299
1686,Zambia,1982,6100407,Africa,51.821,1408.678565
1687,Zambia,1987,7272406,Africa,50.821,1213.315116
1688,Zambia,1992,8381163,Africa,46.1,1210.884633
1689,Zambia,1997,9417789,Africa,40.238,1071.353818
1690,Zambia,2002,10595811,Africa,39.193,1071.613938
1691,Zambia,2007,11746035,Africa,42.384,1271.211593
1692,Zimbabwe,1952,3080907,Africa,48.451,406.884115
1693,Zimbabwe,1957,3646340,Africa,50.469,518.764268


#### How can we find the shape of the dataframe?

In [None]:
df.shape

(1704, 6)

Similar to Numpy, it gives **No. of Rows and Columns** -- **Dimensions**

Now we know how to do some basic operations on dataframes



But what if we aren't loading a dataset, but want to create our own.

Let's take a subset of the original dataset

In [None]:
df.head(3) # We take the first 3 rows to create our dataframe

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071


## How can we create a DataFrame from scratch?


### Approach 1: Row-oriented


- It takes **2 arguments** - Because DataFrame is **2-dimensional** 
    - A **list of rows**
      - Each **row** is packed in a **list `[]`**
      - All rows are packed in an **outside list `[[]]`** - To **pass a list of rows**
    - A **list of column names/labels**

In [None]:
pd.DataFrame([['Afghanistan',1952, 8425333, 'Asia', 28.801, 779.445314 ],
              ['Afghanistan',1957, 9240934, 'Asia', 30.332, 820.853030 ],
              ['Afghanistan',1962, 102267083, 'Asia', 31.997, 853.100710 ]], 
             columns=['country','year','population','continent','life_exp','gdp_cap'])

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,102267083,Asia,31.997,853.10071


#### Can you create a single row dataframe? 

In [None]:
pd.DataFrame(['Afghanistan',1952, 8425333, 'Asia', 28.801, 779.445314 ], 
             columns=['country','year','population','continent','life_exp','gdp_cap'])

ValueError: ignored

#### Why did this give an error?


- Because we passed in a **list of values**


- `DataFrame()` expects a **list of rows**

In [None]:
pd.DataFrame([['Afghanistan',1952, 8425333, 'Asia', 28.801, 779.445314 ]], 
             columns=['country','year','population','continent','life_exp','gdp_cap'])

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314


### Approach 2: Column-oriented


In [None]:
pd.DataFrame({'country':['Afghanistan', 'Afghanistan'], 'year':[1952,1957],
              'population':[842533, 9240934], 'continent':['Asia', 'Asia'],
              'life_exp':[28.801, 30.332], 'gdp_cap':[779.445314, 820.853030]})

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,842533,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303


We **pass the data** as a **dictionary**

- **Key** is the **Column Name/Label**


- **Value** is the **list of values column-wise**


We now have a basic idea about the dataset and creating rows and columns


What kind of **other operations** can we perform on the dataframe?

Thinking from database perspective:
- Adding data
- Removing data
- Updating/Modifying data

and so on

## Basic operations on columns



We can see that our dataset has 6 cols

#### But what if our dataset has 20 cols ? ... or 100 cols ? We can't see ther names in **one go**.

#### How can we get the names of all these cols ?

We can do it in two ways:
  1. df.columns
  2. df.keys



In [None]:
df.columns  # using attribute `columns` of dataframe

Index(['country', 'year', 'population', 'continent', 'life_exp', 'gdp_cap'], dtype='object')

In [None]:
df.keys()  # using method keys() of dataframe

Index(['country', 'year', 'population', 'continent', 'life_exp', 'gdp_cap'], dtype='object')

Note:

- Here, `Index` is a type of pandas class used to store the `address` of the series/dataframe

- It is an Immutable sequence used for indexing and alignment.





In [None]:
# df['country'].head()  # Gives values in Top 5 rows pertaining to the key

Pandas DataFrame and Series are specialised dictionary

#### But what is so "special" about this dictionary?

It can take multiple keys


In [None]:
df[['country', 'life_exp']].head() 

Unnamed: 0,country,life_exp
0,Afghanistan,28.801
1,Afghanistan,30.332
2,Afghanistan,31.997
3,Afghanistan,34.02
4,Afghanistan,36.088


And what if we pass a single column name?

In [None]:
df[['country']].head() 

Unnamed: 0,country
0,Afghanistan
1,Afghanistan
2,Afghanistan
3,Afghanistan
4,Afghanistan


Note:

Notice how this output type is different from our earlier output using `df['country']`

==> `['country']` gives series while `[['country']]` gives dataframe

Now that we know how to access columns, lets answer some questions

### How can we find the countries that have been surveyed ?

We can find the unique vals in the `country` col

#### How can we find unique values in a column?

In [None]:
df['country'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Australia', 'Austria', 'Bahrain', 'Bangladesh', 'Belgium',
       'Benin', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'China',
       'Colombia', 'Comoros', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Czech Republic',
       'Denmark', 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Ethiopia',
       'Finland', 'France', 'Gabon', 'Gambia', 'Germany', 'Ghana',
       'Greece', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Haiti',
       'Honduras', 'Hong Kong, China', 'Hungary', 'Iceland', 'India',
       'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kenya', 'Korea, Dem. Rep.',
       'Korea, Rep.', 'Kuwait', 'Leba

#### Now what if you also want to check the count of each country in the dataframe?


In [None]:
df['country'].value_counts()

Afghanistan          12
Pakistan             12
New Zealand          12
Nicaragua            12
Niger                12
                     ..
Eritrea              12
Equatorial Guinea    12
El Salvador          12
Egypt                12
Zimbabwe             12
Name: country, Length: 142, dtype: int64

Note: 

`value_counts()` shows the output in **decreasing order of frequency**

### What if we want to change the name of a column ?

We can rename the column by:
- passing the dictionary with `old_name:new_name` pair
- specifying `axis=1`

In [None]:
df.rename({"population": "Population", "country":"Country" }, axis = 1)

Unnamed: 0,Country,year,Population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


Alternatively, we can also rename the column without using `axis`
- by using the `column` parameter



In [None]:
df.rename(columns={"country":"Country"})

Unnamed: 0,Country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


We can set it inplace by setting the `inplace` argument = True

In [None]:
df.rename({"country": "Country"}, axis = 1, inplace = True)
df

Unnamed: 0,Country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


**Note**
- .rename has default value of axis=0
- If two columns have the **same name**, then `df['column']` will display both columns

Now lets try another way of accessing column vals

In [None]:
df.Country

0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: Country, Length: 1704, dtype: object

This however doesn't work everytime

#### What do you think could be the problems with using attribute style for accessing the columns?

**Problems** such as
- if the column names are **not strings**
  - Starting with **number**: E.g., ```2nd```
  - Contains a **space**: E.g., ```Roll Number```
- or if the column names conflict with **methods of the DataFrame**
  - E.g. ```shape``` 

It is generally better to avoid this type of accessing columns

#### Are all the columns in our data necessary?
  
  - We already know the continents in which each country lies
  - So we don't need this column


### How can we delete cols in pandas dataframe ?


In [None]:
df.drop('continent', axis=1)

Unnamed: 0,Country,year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
1,Afghanistan,1957,9240934,30.332,820.853030
2,Afghanistan,1962,10267083,31.997,853.100710
3,Afghanistan,1967,11537966,34.020,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106
...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306
1700,Zimbabwe,1992,10704340,60.377,693.420786
1701,Zimbabwe,1997,11404948,46.809,792.449960
1702,Zimbabwe,2002,11926563,39.989,672.038623


The `drop` function takes two parameters:
  - The column name 
  - The axis
  
By default the value of `axis` is 0



An alternative to the above approach is using the "columns" parameter as we did in rename

In [None]:
df.drop(columns=['continent'])

Unnamed: 0,Country,year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
1,Afghanistan,1957,9240934,30.332,820.853030
2,Afghanistan,1962,10267083,31.997,853.100710
3,Afghanistan,1967,11537966,34.020,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106
...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306
1700,Zimbabwe,1992,10704340,60.377,693.420786
1701,Zimbabwe,1997,11404948,46.809,792.449960
1702,Zimbabwe,2002,11926563,39.989,672.038623


As you can see, **column `contintent` is dropped**

#### Has the column permanently been deleted?

In [None]:
df.head()

Unnamed: 0,Country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


NO, the **column `continent` is still there**

**Do you see what's happening here?**

We only got a **view of dataframe with column `continent` dropped**

#### How can we permanently drop the column?

We can either **re-assign** it
- `df = df.drop('continent', axis=1)`

  OR
    
- We can **set parameter `inplace=True`**
    
By **default, `inplace=False`**

In [None]:
df.drop('continent', axis=1, inplace=True)

In [None]:
df.head() #we print the head to check

Unnamed: 0,Country,year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
1,Afghanistan,1957,9240934,30.332,820.85303
2,Afghanistan,1962,10267083,31.997,853.10071
3,Afghanistan,1967,11537966,34.02,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106


Now we can see the column `continent` is permanently dropped

### Now similarly, what if we want to create a new column?

We can either
- use values from **existing columns** 

OR 
- create our **own values**

#### How to create a column using values from an existing column?

In [None]:
df["year+7"] = df["year"] + 7
df.head()

Unnamed: 0,Country,year,population,life_exp,gdp_cap,year+7
0,Afghanistan,1952,8425333,28.801,779.445314,1959
1,Afghanistan,1957,9240934,30.332,820.85303,1964
2,Afghanistan,1962,10267083,31.997,853.10071,1969
3,Afghanistan,1967,11537966,34.02,836.197138,1974
4,Afghanistan,1972,13079460,36.088,739.981106,1979


As we see, a new column `year+7` is created from the column `year`

We can also use values from two columns to form a new column

#### Which two columns can we use to create a new column `gdp`?

In [None]:
df['gdp']=df['gdp_cap'] * df['population']
df.head()

Unnamed: 0,Country,year,population,life_exp,gdp_cap,year+7,gdp
0,Afghanistan,1952,8425333,28.801,779.445314,1959,6567086000.0
1,Afghanistan,1957,9240934,30.332,820.85303,1964,7585449000.0
2,Afghanistan,1962,10267083,31.997,853.10071,1969,8758856000.0
3,Afghanistan,1967,11537966,34.02,836.197138,1974,9648014000.0
4,Afghanistan,1972,13079460,36.088,739.981106,1979,9678553000.0


#### As you can see

- An **additional column** has been **created**

- **Values** in this column are **product of respective values in `gdp_cap` and `population`**


#### What other operations we can use?

Subtraction, Addition, etc.

### How can we create a new column from our own values?

- We can **create a list**

OR

- We can **create a Pandas Series** from a list/numpy array for our new column

In [None]:
df["Own"] = [i for i in range(1704)]  # count of these values should be correct
df

Unnamed: 0,Country,year,population,life_exp,gdp_cap,year+7,gdp,Own
0,Afghanistan,1952,8425333,28.801,779.445314,1959,6.567086e+09,0
1,Afghanistan,1957,9240934,30.332,820.853030,1964,7.585449e+09,1
2,Afghanistan,1962,10267083,31.997,853.100710,1969,8.758856e+09,2
3,Afghanistan,1967,11537966,34.020,836.197138,1974,9.648014e+09,3
4,Afghanistan,1972,13079460,36.088,739.981106,1979,9.678553e+09,4
...,...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306,1994,6.508241e+09,1699
1700,Zimbabwe,1992,10704340,60.377,693.420786,1999,7.422612e+09,1700
1701,Zimbabwe,1997,11404948,46.809,792.449960,2004,9.037851e+09,1701
1702,Zimbabwe,2002,11926563,39.989,672.038623,2009,8.015111e+09,1702


Now that we know how to create new cols lets see some basic ops on rows

Before that lets drop the newly created cols


In [None]:
df.drop(columns=["Own",'gdp', 'year+7'], axis = 1, inplace = True)
df

Unnamed: 0,Country,year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
1,Afghanistan,1957,9240934,30.332,820.853030
2,Afghanistan,1962,10267083,31.997,853.100710
3,Afghanistan,1967,11537966,34.020,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106
...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306
1700,Zimbabwe,1992,10704340,60.377,693.420786
1701,Zimbabwe,1997,11404948,46.809,792.449960
1702,Zimbabwe,2002,11926563,39.989,672.038623
