<a href="https://colab.research.google.com/github/dscoool/waterai/blob/main/Pandas_Dataframe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using the Pandas Data Frame as a Database.

## Let us understand how to use the pandas data frame as a database.

![alt text](https://miro.medium.com/max/2400/1*9v51-jsfHtk6fgAIYLoiHQ.jpeg)

Before starting let me quickly tell about the pandas data frame: It is a python library that provides high performance, and easy-to-use data structure for data analysis tools for python programming language. Below is an article that explains the primitive manipulations performed on the pandas data frame.

[Pandas Data Frame](https://towardsdatascience.com/manipulating-the-data-with-pandas-using-python-be6c5dfabd47?source=post_page-----282edec5a3ab----------------------)

Let’s get started, this is a programming tutorial so I recommend you guys to practice side by side with me. I favor using Google Colab or Jupyter notebooks. To brief out, I will teach you guys how to use the pandas data frame as a database to store data and perform some rudimentary operations on it. This data frame has almost all the features compared to a database. It almost resembles a database.

**Steps that will be followed in this tutorial are**
1. Creating a pandas data frame
2. Adding a row to the data frame
3. Deleting a row from the data frame
4. Accessing the value of a row from the data frame
5. Changing the value of a row in the data frame

**Let see how can we perform all the steps declared above**

## 1. Creating a pandas data frame

To create the data frame, first you need to import it, and then you have to specify the column name and the values in the order shown below:

In [None]:
import pandas as pd

Let’s create a new data frame. I am storing the company name, Founders, Founded and Number of Employees. You can store the data of your choice inside the data frame.

In [None]:
df = pd.DataFrame({'Company Name':['Google', 'Microsoft', 'SpaceX', 'Amazon', 'Samsung'],
                   'Founders':['Larry Page, Sergey Brin', 'Bill Gates, Paul Allen','Elon Musk','Jeff Bezos', 'Lee Byung-chul'],
                   'Founded': [1998, 1975, 2002, 1994, 1938],
                   'Number of Employees': ['103,459', '144,106', '6,500', '647,500', '320,671']})
df

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,Google,"Larry Page, Sergey Brin",1998,103459
1,Microsoft,"Bill Gates, Paul Allen",1975,144106
2,SpaceX,Elon Musk,2002,6500
3,Amazon,Jeff Bezos,1994,647500
4,Samsung,Lee Byung-chul,1938,320671


Don’t worry there is nothing complicated here, it’s just the values that might confuse you as they are just Company name, founders, founded, etc. Be careful with the brackets it can make your life miserable if you mess with it.


---



## 2. Adding a new row to the data frame

Think now you want to add a new row to the data frame, all you can do is add the new row to the end of the data frame or any specific location of your choice.

**Case 1: Adding a row at the end of the data frame:**

To append the row at the end of the data frame, you need to use the “**append method**” by passing the values you want to append. Below is the official documentation of append function:

[Append Method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html?source=post_page-----282edec5a3ab----------------------)

Let's create a new data frame with new values and then append that to the end of the existing data frame.

In [None]:
df1 = pd.DataFrame({'Company Name':['WhatsApp'], 'Founders':['Jan Koum, Brian Acton'], 'Founded': [2009], 'Number of Employees': ['50'] })
df1

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,WhatsApp,"Jan Koum, Brian Acton",2009,50


df1 is a new data frame that we want to append to the existing data frame df

In [None]:
df = df.append(df1, ignore_index=True)
df

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,Google,"Larry Page, Sergey Brin",1998,103459
1,Microsoft,"Bill Gates, Paul Allen",1975,144106
2,SpaceX,Elon Musk,2002,6500
3,Amazon,Jeff Bezos,1994,647500
4,Samsung,Lee Byung-chul,1938,320671
5,WhatsApp,"Jan Koum, Brian Acton",2009,50


**Case 2: Adding a new row at a specific location**

Let us now add a new row of values at index 3, meaning below “SpaceX” company. To do this, we can use the pandas “iloc method” by specifying the index and the values to be added. Below is the documentation of iloc method

[iloc method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html?source=post_page-----282edec5a3ab----------------------)

In [None]:
df.iloc[3] = ['YouTube', 'Chad Hurley, Steve Chen, Jawed Karim', 2005, '2,800']
df

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,Google,"Larry Page, Sergey Brin",1998,103459
1,Microsoft,"Bill Gates, Paul Allen",1975,144106
2,SpaceX,Elon Musk,2002,6500
3,YouTube,"Chad Hurley, Steve Chen, Jawed Karim",2005,2800
4,Samsung,Lee Byung-chul,1938,320671
5,WhatsApp,"Jan Koum, Brian Acton",2009,50


With the help of iloc we can add a new row anywhere within the data frame.


---



## 3. Deleting a row from the data frame

Some times there might be few cases where you actually need to remove unnecessary data from the database or data frame. To do so, the “**drop method**” in pandas gets the job done. Let’s see two cases such as deleting a row with its index and deleting a row with the help of a value. Below is the documentation of pandas drop method:

[Pandas drop method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html?source=post_page-----282edec5a3ab----------------------)

**Case 1: Deleting a row with its index**

Now this can be done by mentioning the index inside the drop method

In [None]:
df = df.drop([df.index[5]])
df

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,Google,"Larry Page, Sergey Brin",1998,103459
1,Microsoft,"Bill Gates, Paul Allen",1975,144106
2,SpaceX,Elon Musk,2002,6500
3,YouTube,"Chad Hurley, Steve Chen, Jawed Karim",2005,2800
4,Samsung,Lee Byung-chul,1938,320671


As seen above the index 5 i.e **WhatsApp** company’s row was removed completely.

**Case 2: Deleting a row with the help of a value.**

Now let us see how can we delete a row with its value.

In [None]:
df = df[df.Founders != 'Chad Hurley, Steve Chen, Jawed Karim']
df

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,Google,"Larry Page, Sergey Brin",1998,103459
1,Microsoft,"Bill Gates, Paul Allen",1975,144106
2,SpaceX,Elon Musk,2002,6500
4,Samsung,Lee Byung-chul,1938,320671


Now the **YouTube** company’s row is deleted with the help of providing its values.



---



## 4. Accessing the value of a row from the data frame
Accessing the value from the data frame is pretty trivial, all you have to do is use “loc method”. The loc method accepts the index as the parameter, by specifying it you can retrieve the value from it. Below is the documentation of loc method:

[loc method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html?source=post_page-----282edec5a3ab----------------------)

Now, let us think that we need to access the 3nd-row values, to do call **loc[2]** and your job is done. The rows are stored from the 0th index hence 3rd-row index is 2 (0, 1, 2).

In [None]:
df.loc[2]

Company Name              SpaceX
Founders               Elon Musk
Founded                     2002
Number of Employees        6,500
Name: 2, dtype: object



---



## 5. Changing the value of a row in the data frame
This can be done by using the “at” method. To use the “at” method all you have to do is specify its index and the location of the column name and then the value that you need to change. Below is the documentation of the “**at**” method:

[At method](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.at.html?source=post_page-----282edec5a3ab----------------------)

For example, let's think that I want to change the number of employees of Microsoft to 200,000 (I am sorry Microsoft I have to do this for my tutorial).

In [None]:
df.at[1, 'Number of Employees'] = '200,000'
df # This is after the change of value (Microsoft)

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,Google,"Larry Page, Sergey Brin",1998,103459
1,Microsoft,"Bill Gates, Paul Allen",1975,200000
2,SpaceX,Elon Musk,2002,6500
4,Samsung,Lee Byung-chul,1938,320671


In [None]:
df  # This is before the change of value (Microsoft)

Unnamed: 0,Company Name,Founders,Founded,Number of Employees
0,Google,"Larry Page, Sergey Brin",1998,103459
1,Microsoft,"Bill Gates, Paul Allen",1975,144106
2,SpaceX,Elon Musk,2002,6500
4,Samsung,Lee Byung-chul,1938,320671


There you go this is how you can use the pandas data frame as a database. You have successfully completed all the steps of the tutorial. I hope you have enjoyed reading it. There are plenty of more operations, functions or methods that can be performed on a data frame. I cannot explain every operation in one stretch, let me know if you guys need more tutorials about these concepts. Don’t forget to read all the documentation link of the official pandas data frame that I have provided in this tutorial. Until then see you. Have a good day.





---

