<a href="https://colab.research.google.com/github/AayushiBr/AayushiBr/blob/main/Pandas_DataFrame.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**DataFrame:** A DataFrame is similar to an in-memory spreadsheet. Like a spreadsheet:

*   A DataFrame stores data in cells.
*   A DataFrame has named columns (usually) and numbered rows.




**Import NumPy and pandas modules**

In [5]:
import numpy as np

In [6]:
import pandas as pd

**Creating a DataFrame:**
The following code cell creates a simple DataFrame containing 10 cells organized as follows:
*   5 rows
*   2 columns, one named temperature and the other named activity

The following code cell instantiates a pd.DataFrame class to generate a DataFrame. The class takes two arguments:
*   The first argument provides the data to populate the 10 cells. The code cell calls np.array to generate the 5x2 NumPy array
*   The second argument identifies the names of the two columns


In [7]:
my_data = np.array([[0,3],[10,7],[20,9],[30,14],[40,15]])

In [8]:
my_column_names=['temperature','activity']

In [9]:
my_dataframe=pd.DataFrame(data=my_data, columns=my_column_names)

**Adding a new column to a DataFrame**

You may add a new column to an existing pandas DataFrame just by assigning values to a new column name. For example, the following code creates a third column named adjusted in my_dataframe:

In [11]:
my_dataframe["adjusted"]=my_dataframe["activity"]+2

In [10]:
print(my_dataframe)

   temperature  activity
0            0         3
1           10         7
2           20         9
3           30        14
4           40        15


In [12]:
print(my_dataframe)

   temperature  activity  adjusted
0            0         3         5
1           10         7         9
2           20         9        11
3           30        14        16
4           40        15        17


**Specifying a subset of a DataFrame**

Pandas provide multiples ways to isolate specific rows, columns, slices or cells in a DataFrame.

In [13]:
print("Row #0, #1, and #2")

Row #0, #1, and #2


In [14]:
print(my_dataframe.head(3),'\n')

   temperature  activity  adjusted
0            0         3         5
1           10         7         9
2           20         9        11 



In [15]:
print("Row #2")

Row #2


In [16]:
print(my_dataframe.iloc[[2]],'\n')

   temperature  activity  adjusted
2           20         9        11 



In [17]:
print("Row #1,#2, and #3")

Row #1,#2, and #3


In [18]:
print(my_dataframe[1:4],'\n')

   temperature  activity  adjusted
1           10         7         9
2           20         9        11
3           30        14        16 



In [19]:
print("Column 'temperature':")

Column 'temperature':


In [20]:
print(my_dataframe['temperature'])

0     0
1    10
2    20
3    30
4    40
Name: temperature, dtype: int64


**Create a DataFrame**


1. Create an 3x4 (3 rows x 4 columns) pandas DataFrame in which the columns are named Aayushi, Nikita, Vipul and Sonal. Populate each of the 12 cells in the DataFrame with a random integer between 0 and 100, inclusive.

2. Output the following:
* the entire DataFrame
* the value in the cell of row #1 of the Eleanor column
3. Create a fifth column named Janet, which is populated with the row-by-row sums of Tahani and Jason.

In [21]:
my_column_name=['Aayushi','Nikita','Vipul','Sonal']

In [22]:
my_data=np.random.randint(low=0,high=101,size=(3,4))

In [23]:
df=pd.DataFrame(data=my_data,columns=my_column_name)

In [24]:
print(df)

   Aayushi  Nikita  Vipul  Sonal
0       37      22     72     70
1       51      21     24     21
2       35     100     54     87


In [25]:
print("\n Second row of the Aayushi column: %d\n" %df['Aayushi'][1])


 Second row of the Aayushi column: 51



In [26]:
df['Janet']=df['Vipul'] + df['Sonal']

In [27]:
print(df)

   Aayushi  Nikita  Vipul  Sonal  Janet
0       37      22     72     70    142
1       51      21     24     21     45
2       35     100     54     87    141


In [28]:
print("Experiment with a reference:")
reference_to_df=df

Experiment with a reference:


In [29]:
print(" Starting value of df: %d" %df['Sonal'][1])

 Starting value of df: 21


In [30]:
print(" Starting value of refernce_to_df: %d\n" %reference_to_df['Sonal'][1])

 Starting value of refernce_to_df: 21



In [31]:
df.at[1,'Sonal']=df['Sonal'][1]+5

In [32]:
print("Updated df : %d" %df['Sonal'][1])

Updated df : 26


In [35]:
print("Updated reference_to_df : %d\n\n" %reference_to_df['Sonal'][1])

Updated reference_to_df : 26




**Copying a DataFrame** 

Pandas provides two different ways to duplicate a DataFrame:

* **Referencing** - If you assign a DataFrame to a new variable, any change to the DataFrame or to the new variable will be reflected in the other.
* **Copying** - If you call the pd.DataFrame.copy method, you create a true independent copy. Changes to the original DataFrame or to the copy will not be reflected in the other.

The difference is subtle, but important.

In [36]:
print("Experiment with a true copy:")

Experiment with a true copy:


In [37]:
copy_of_my_dataframe=my_dataframe.copy()

In [38]:
print(" Starting value of my_dataframe: %d" %my_dataframe['activity'][1])

 Starting value of my_dataframe: 7


In [39]:
print(" Starting value of copy_of_my_dataframe: %d" %copy_of_my_dataframe['activity'][1])

 Starting value of copy_of_my_dataframe: 7


In [40]:
my_dataframe.at[1,'activity']=my_dataframe['activity'][1]+3

In [41]:
print("Updated my_dataframe: %d" %my_dataframe['activity'][1])

Updated my_dataframe: 10


In [42]:
print("copy_of_my_dataframe does not get updated %d" %copy_of_my_dataframe['activity'][1])

copy_of_my_dataframe does not get updated 7
