In [1]:
import pandas as pd

import numpy as np

# What is pandas lib? 

- Pandas is open-source Python library which is used for data manipulation and analysis. It consist of data structures and functions to perform efficient operations on data. It is well-suited for working with tabular data such as spreadsheets or SQL tables. It is used in data science because it works well with other important libraries. It is built on top of the NumPy library as it makes easier to manipulate and analyze. Pandas is used in other libraries such as:

Matplotlib for plotting graphs
SciPy for statistical analysis
Scikit-learn for machine learning algorithms.
It uses many functionalities provided by NumPy library.



- Data Cleaning, Merging and Joining: Clean and combine data from multiple sources, handling inconsistencies and duplicates.
- Handling Missing Data: Manage missing values (NaN) in both floating and non-floating point data.
- Column Insertion and Deletion: Easily add, remove or modify columns in a DataFrame.
- Group By Operations: Use "split-apply-combine" to group and analyze data.
- Data Visualization: Create visualizations with Matplotlib and Seaborn, integrated with Pandas.






# Intro to Dataframes 


- Dataframes is the main Datastructure in py pandas lib , and we can think of them as tables with sorts of extra functionality . 


A Pandas DataFrame is a two-dimensional table-like structure in Python where data is arranged in rows and columns. It’s one of the most commonly used tools for handling data and makes it easy to organize, analyze and manipulate data. It can store different types of data such as numbers, text and dates across its columns. The main parts of a DataFrame are:

Data: Actual values in the table.
Rows: Labels that identify each row.
Columns: Labels that define each data category.


-> we can aslo say :

- A DataFrame in pandas is basically like a supercharged Excel table inside Python — but with way more power and flexibility.

Here’s the breakdown:

Structure:

Rows → represent records or observations.

Columns → represent variables or features.

Index → labels for each row (can be numbers, dates, or custom IDs).

Key Features:

Can hold different data types in different columns (integers, floats, strings, dates, etc.).

Comes with built-in functions to filter, sort, group, and analyze data.

Works seamlessly with CSV, Excel, SQL, JSON, and many other formats.

Allows vectorized operations — meaning you can perform math or transformations on entire columns at once, much faster than looping.

In [2]:
# we can create our data fram directily 

df=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])

# we can also add column names to our data frame

df=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]],columns=['a','b','c'])

# we can also add row names to our data frame

df=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]],columns=['a','b','c'],index=['row1','row2','row3'])

# we can also make this afrer creation by 

df.index=['1','2','3']
df.columns=['A','B','C']

# without naming it , it will be named as 0,1,2,3,4,5,6,7,8 


# we can also create a data frame from a dictionary
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]})
df.index=['row1','row2','row3']

# where a ,b, c are the column names and the values are the rows

In [24]:
# we  can start to look at things within this data frame

df.head()  # shows the first 5 rows of the dataframe

# head wit default is 5, but we can change it to any number
df.head(2)  # shows the first 2 rows of the dataframe

df.tail()  # shows the last 5 rows of the dataframe

df.tail(2)  # shows the last 2 rows of the dataframe





Unnamed: 0,a,b,c
row2,2,5,8
row3,3,6,9


In [4]:
df.columns  # shows the column names of the dataframe

df.index  # shows the row names of the dataframe

df # shows the whole dataframe

df.shape  # shows the shape of the dataframe (rows, columns)

(3, 3)

In [5]:
# also one useful thing is to do with dataframes is : 

df.info()  # shows the information about the dataframe, like number of rows, columns, data types, memory usage

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, row1 to row3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       3 non-null      int64
 1   b       3 non-null      int64
 2   c       3 non-null      int64
dtypes: int64(3)
memory usage: 96.0+ bytes


In [6]:
# another thing that is useful is to do is :
df.describe()  # shows the statistical summary of the dataframe, like count, mean, std, min, max, etc. for numeric columns

Unnamed: 0,a,b,c
count,3.0,3.0,3.0
mean,2.0,5.0,8.0
std,1.0,1.0,1.0
min,1.0,4.0,7.0
25%,1.5,4.5,7.5
50%,2.0,5.0,8.0
75%,2.5,5.5,8.5
max,3.0,6.0,9.0


In [7]:
# we also can find the unique values in a column
df.nunique() # shows the number of unique values in each column

# we can also find the unique values in a column
df['a'].unique()  # shows the unique values in column A

array([1, 2, 3])

# loading our data 

In [66]:
coffee=pd.read_csv('./complete-pandas-tutorial/warmup-data/coffee.csv')  # reads a csv file into a dataframe   
coffee.head()  # shows the first 5 rows of the dataframe



Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,25
1,Monday,Latte,15
2,Tuesday,Espresso,30
3,Tuesday,Latte,20
4,Wednesday,Espresso,35


In [9]:
# if you don't donwnload the file, you can use this link to download it:
# https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/coffee.csv

coffee= pd.read_csv('https://raw.githubusercontent.com/KeithGalli/complete-pandas-tutorial/refs/heads/master/warmup-data/coffee.csv')  # reads a csv file into a dataframe from a URL

coffee.head()  # shows the first 5 rows of the dataframe

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,25
1,Monday,Latte,15
2,Tuesday,Espresso,30
3,Tuesday,Latte,20
4,Wednesday,Espresso,35


- csv is the most popular file format , but not always the best file format 
they are readable but to many memory size 

- we have [feather format , parquet file , and you can compine may cvs into xlsx files (many size too) ]

In [10]:
results = pd.read_parquet('./complete-pandas-tutorial/data/results.parquet')  # reads a parquet file into a dataframe
results.head()  # shows the first 5 rows of the dataframe

Unnamed: 0,year,type,discipline,event,as,athlete_id,noc,team,place,tied,medal
0,1912.0,Summer,Tennis,"Singles, Men (Olympic)",Jean-François Blanchy,1,FRA,,17.0,True,
1,1912.0,Summer,Tennis,"Doubles, Men (Olympic)",Jean-François Blanchy,1,FRA,Jean Montariol,,False,
2,1920.0,Summer,Tennis,"Singles, Men (Olympic)",Jean-François Blanchy,1,FRA,,32.0,True,
3,1920.0,Summer,Tennis,"Doubles, Mixed (Olympic)",Jean-François Blanchy,1,FRA,Jeanne Vaussard,8.0,True,
4,1920.0,Summer,Tennis,"Doubles, Men (Olympic)",Jean-François Blanchy,1,FRA,Jacques Brugnon,4.0,False,


In [13]:
olimpics = pd.read_excel('./complete-pandas-tutorial/data/olympics-data.xlsx')  # reads an excel file into a dataframe
olimpics.head()  # shows the first 5 rows of the dataframe

Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date
0,1,Jean-François Blanchy,1886-12-12,Bordeaux,Gironde,FRA,France,,,1960-10-02
1,2,Arnaud Boetsch,1969-04-01,Meulan,Yvelines,FRA,France,183.0,76.0,
2,3,Jean Borotra,1898-08-13,Biarritz,Pyrénées-Atlantiques,FRA,France,183.0,76.0,1994-07-17
3,4,Jacques Brugnon,1895-05-11,Paris VIIIe,Paris,FRA,France,168.0,64.0,1978-03-20
4,5,Albert Canet,1878-04-17,Wandsworth,England,GBR,France,,,1930-07-25


In [None]:
olimpics.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145500 entries, 0 to 145499
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   athlete_id    145500 non-null  int64  
 1   name          145500 non-null  object 
 2   born_date     143693 non-null  object 
 3   born_city     110908 non-null  object 
 4   born_region   110908 non-null  object 
 5   born_country  110908 non-null  object 
 6   NOC           145499 non-null  object 
 7   height_cm     106651 non-null  float64
 8   weight_kg     102070 non-null  float64
 9   died_date     33940 non-null   object 
dtypes: float64(2), int64(1), object(7)
memory usage: 11.1+ MB


In [16]:
olimpics.describe()

Unnamed: 0,athlete_id,height_cm,weight_kg
count,145500.0,106651.0,102070.0
mean,73686.188955,176.333724,71.890996
std,42868.960158,10.380282,14.46554
min,1.0,127.0,25.0
25%,36663.75,170.0,62.0
50%,73302.5,176.0,70.0
75%,110306.25,183.0,80.0
max,149814.0,226.0,198.0


In [17]:
bios= pd.read_csv('./complete-pandas-tutorial/data/bios.csv')  # reads a csv file into a dataframe

In [None]:
# to mention , we have functions like 
# bios.to_csv('bios.csv')  # saves the dataframe to a csv file
# bios.to_excel('bios.xlsx')  # saves the dataframe to an excel file
# bios.to_parquet('bios.parquet')  # saves the dataframe to a parquet file

# Accessing Data with pandas

In [18]:
coffee.head()  # shows the first 5 rows of the dataframe

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,25
1,Monday,Latte,15
2,Tuesday,Espresso,30
3,Tuesday,Latte,20
4,Wednesday,Espresso,35


In [25]:
# to diplay aall the columns in the dataframe
display(coffee)  # shows the whole dataframe
# coffee
 # print(coffee)  # shows the whole dataframe in the console   

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,25
1,Monday,Latte,15
2,Tuesday,Espresso,30
3,Tuesday,Latte,20
4,Wednesday,Espresso,35
5,Wednesday,Latte,25
6,Thursday,Espresso,40
7,Thursday,Latte,30
8,Friday,Espresso,45
9,Friday,Latte,35


In [26]:
coffee.tail()

Unnamed: 0,Day,Coffee Type,Units Sold
9,Friday,Latte,35
10,Saturday,Espresso,45
11,Saturday,Latte,35
12,Sunday,Espresso,45
13,Sunday,Latte,35


In [29]:
# if you want to access random data 
coffee.sample(5)  # shows 5 random rows from the dataframe

Unnamed: 0,Day,Coffee Type,Units Sold
1,Monday,Latte,15
2,Tuesday,Espresso,30
7,Thursday,Latte,30
0,Monday,Espresso,25
12,Sunday,Espresso,45


# loc % iloc

In [None]:
# if we want to access specific rows and columns, we can use loc and iloc
# loc is used for label based indexing, while iloc is used for position based indexing

# coffee.loc[ roew, column]

coffee.loc[0] # this will return the first row of the dataframe

Day              Monday
Coffee Type    Espresso
Units Sold           25
Name: 0, dtype: object

In [33]:
coffee.loc[[1,2,5]]

Unnamed: 0,Day,Coffee Type,Units Sold
1,Monday,Latte,15
2,Tuesday,Espresso,30
5,Wednesday,Latte,25


In [39]:
# we can work with slices as well
coffee.loc[1:3, ["Day", "Units Sold"]]


Unnamed: 0,Day,Units Sold
1,Monday,15
2,Tuesday,30
3,Tuesday,20


In [None]:
# we can work with index location with iloc

coffee.iloc[1:3,[0,2]]

#upper index is not included, so it will return rows 1 and 2, and columns 0 and 2

Unnamed: 0,Day,Units Sold
1,Monday,15
2,Tuesday,30


In [55]:
coffee.index = coffee.Day
# or 
coffee.index = coffee['Day']
coffee

coffee.loc["Monday":"Wednesday", "Units Sold"]
#here i just indexwed the dataframe by the Day column, so now the Day column is the index of the dataframe

Day
Monday       25
Monday       15
Tuesday      30
Tuesday      20
Wednesday    35
Wednesday    25
Name: Units Sold, dtype: int64

In [57]:
coffee.loc["Monday":"Wednesday", "Day": "Units Sold"]

Unnamed: 0_level_0,Day,Coffee Type,Units Sold
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Monday,Monday,Espresso,25
Monday,Monday,Latte,15
Tuesday,Tuesday,Espresso,30
Tuesday,Tuesday,Latte,20
Wednesday,Wednesday,Espresso,35
Wednesday,Wednesday,Latte,25


In [61]:
coffee.iloc[0:5, 0:2 ]

Unnamed: 0,Day,Coffee Type
0,Monday,Espresso
1,Monday,Latte
2,Tuesday,Espresso
3,Tuesday,Latte
4,Wednesday,Espresso


In [65]:
# what is want to set specific value 

coffee.loc[1:3, "Units Sold"] = 10 
coffee# sets the value of the Units Sold column for the row with index 1 to 100

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,25
1,Monday,Latte,10
2,Tuesday,Espresso,10
3,Tuesday,Latte,10
4,Wednesday,Espresso,35
5,Wednesday,Latte,25
6,Thursday,Espresso,40
7,Thursday,Latte,30
8,Friday,Espresso,45
9,Friday,Latte,35


In [69]:
# there is more optimize way to get specific values
coffee.at[1, "Units Sold"]   # sets the value of the Units Sold column for the row with index 1 to 100

# not it get only one value, not a series or a dataframe, so it is more efficient

#wealso have :
coffee.iat[1, 2]  # gets the value of the Units Sold column for the row with index 1, which is 10

np.int64(15)

In [73]:
# to grab specific columns, we can use the following syntax
coffee[['Day', 'Units Sold']]  # gets the Day and Units Sold columns
# or 
coffee.Day

0        Monday
1        Monday
2       Tuesday
3       Tuesday
4     Wednesday
5     Wednesday
6      Thursday
7      Thursday
8        Friday
9        Friday
10     Saturday
11     Saturday
12       Sunday
13       Sunday
Name: Day, dtype: object

In [78]:
# we can also sort our data 

coffee.sort_values("Units Sold")  # sorts the dataframe by the Units Sold column in ascending order

# if we want to be in ascending order, we can use the following syntax
coffee.sort_values("Units Sold", ascending=False)  # sorts the dataframe by the Units Sold column in descending order

# we can also sort by multiple columns
coffee.sort_values(["Units Sold", "Coffee Type"], ascending= [False , True])  
# it got sorted by the Units Sold column first, then by the coffe Type column in descending order

Unnamed: 0,Day,Coffee Type,Units Sold
8,Friday,Espresso,45
10,Saturday,Espresso,45
12,Sunday,Espresso,45
6,Thursday,Espresso,40
4,Wednesday,Espresso,35
9,Friday,Latte,35
11,Saturday,Latte,35
13,Sunday,Latte,35
2,Tuesday,Espresso,30
7,Thursday,Latte,30


In [80]:
# sometimes we need to iterate over the rows of a datafram

#  by for : 

for index , row in coffee.iterrows():
    print(index)
    print(row)
    print("-----")  

0
Day              Monday
Coffee Type    Espresso
Units Sold           25
Name: 0, dtype: object
-----
1
Day            Monday
Coffee Type     Latte
Units Sold         15
Name: 1, dtype: object
-----
2
Day             Tuesday
Coffee Type    Espresso
Units Sold           30
Name: 2, dtype: object
-----
3
Day            Tuesday
Coffee Type      Latte
Units Sold          20
Name: 3, dtype: object
-----
4
Day            Wednesday
Coffee Type     Espresso
Units Sold            35
Name: 4, dtype: object
-----
5
Day            Wednesday
Coffee Type        Latte
Units Sold            25
Name: 5, dtype: object
-----
6
Day            Thursday
Coffee Type    Espresso
Units Sold           40
Name: 6, dtype: object
-----
7
Day            Thursday
Coffee Type       Latte
Units Sold           30
Name: 7, dtype: object
-----
8
Day              Friday
Coffee Type    Espresso
Units Sold           45
Name: 8, dtype: object
-----
9
Day            Friday
Coffee Type     Latte
Units Sold         35
Name: 9,

# Filtering Data : 

In [82]:
bios.info()  # shows the first 5 rows of the bios dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145500 entries, 0 to 145499
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   athlete_id    145500 non-null  int64  
 1   name          145500 non-null  object 
 2   born_date     143693 non-null  object 
 3   born_city     110908 non-null  object 
 4   born_region   110908 non-null  object 
 5   born_country  110908 non-null  object 
 6   NOC           145499 non-null  object 
 7   height_cm     106651 non-null  float64
 8   weight_kg     102070 non-null  float64
 9   died_date     33940 non-null   object 
dtypes: float64(2), int64(1), object(7)
memory usage: 11.1+ MB


In [92]:
# if we want to filter based on height_cm   & weight_kg

bios.loc[bios["height_cm"] > 220 , ['name' , 'height_cm']] # filters the dataframe to show only rows where height_cm is greater than 180

# we can also have a short hand syntax
bios[bios["height_cm"] > 220][['name' , 'height_cm']]  # filters the dataframe to show only rows where height_cm is greater than 180

Unnamed: 0,name,height_cm
5673,Gunther Behnke,221.0
5781,Tommy Burleson,223.0
6978,Arvydas Sabonis,223.0
89070,Yao Ming,226.0
89075,Roberto Dueñas,221.0
120266,Zhang Zhaoxu,221.0


In [97]:
# what is i want multible conditions
bios[(bios["height_cm"]>215) & (bios["born_country"]=="USA")][['name' , 'height_cm']]  # filters the dataframe to show only rows where height_cm is greater than 220 and weight_kg is greater than 100

Unnamed: 0,name,height_cm
5781,Tommy Burleson,223.0
6722,Shaquille O'Neal,216.0
6937,David Robinson,216.0
123850,Tyson Chandler,216.0


In [103]:
#  we can aslo do sone specific filters based on string operations 

bios[bios["name"].str.contains("Keith")]  # filters the dataframe to show only rows where the name column contains the string "John"
# it is by default a case sensitive search, but we can make it case insensitive by using the following syntax
bios[bios["name"].str.contains("Keith", case=False)]  # filters the dataframe to show only rows where the name column contains the string "John" in a case insensitive way

# we can work with regular expressions as well
bios[bios["name"].str.contains("K.*h", regex=True)]  # filters the dataframe to show only rows where the name column contains the string "K" followed by any characters and then "h"

Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date
126,127,Claudia Kohde-Kilsch,1963-12-11,Saarbrücken,Saarland,GER,West Germany,184.0,68.0,
132,133,Katrin Schmidt,1967-09-28,Langenhagen,Niedersachsen,GER,Germany,180.0,80.0,
134,135,Karen Stechmann,1971-09-15,Stade,Niedersachsen,GER,Germany,169.0,59.0,
150,151,Stephan Kuhl,1968-03-27,Köln (Cologne),Nordrhein-Westfalen,GER,Germany,180.0,79.0,
186,187,Kenneth Erichsen,1972-12-28,,,,Guatemala,179.0,72.0,
...,...,...,...,...,...,...,...,...,...,...
145335,149057,Vitaliy Kalinichenko,1993-08-09,,,,Ukraine,,,
145346,149069,Kaysha Love,1997-09-24,West Jordan,Utah,USA,United States,,,
145370,149093,Kaila Kuhn,2003-04-08,Boyne City,Michigan,USA,United States,,,
145461,149188,Kent Johnson,2002-10-18,Port Moody,British Columbia,CAN,Canada,185.0,75.0,


In [108]:
bios[bios["born_country"].isin(["USA", "FRA"]) & bios["name"].str.contains("Keith" ,  case = False)]  # filters the dataframe to show only rows where the born_country column is either "USA" or "Canada"

Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date
50929,51288,Keith Carter,1924-08-30,Akron,Ohio,USA,United States,,,2013-05-03
51185,51544,Keith Russell,1948-01-15,Mesa,Arizona,USA,United States,188.0,73.0,
52913,53288,Keith Erickson,1944-04-19,San Francisco,California,USA,United States,196.0,86.0,
62678,63144,Keith Notary,1960-01-22,Merritt Island,Florida,USA,United States,170.0,66.0,
77550,78141,Keith Brantly,1962-05-23,Scott Air Force Base,Illinois,USA,United States,180.0,64.0,
84097,84766,Keith Christiansen,1944-07-14,International Falls,Minnesota,USA,United States,165.0,69.0,2018-11-05
94646,95413,Keith Meyer,1938-06-20,Geneva,Illinois,USA,United States,,,2010-07-25
97499,98286,Keith Tkachuk,1972-03-28,Melrose,Massachusetts,USA,United States,188.0,102.0,
98068,98860,Keith Wegeman,1929-08-28,Denver,Colorado,USA,United States,,,1974-08-22
99921,100722,Keith Carney,1970-02-03,Providence,Rhode Island,USA,United States,188.0,93.0,


In [109]:
# we cab use quary function as well
bios.query("born_country == 'USA' and height_cm > 220")[['name', 'height_cm']]  # filters the dataframe to show only rows where the born_country column is "USA" and height_cm is greater than 220  

Unnamed: 0,name,height_cm
5781,Tommy Burleson,223.0


# Adding and removing columns 

In [110]:
coffee.head()

Unnamed: 0,Day,Coffee Type,Units Sold
0,Monday,Espresso,25
1,Monday,Latte,15
2,Tuesday,Espresso,30
3,Tuesday,Latte,20
4,Wednesday,Espresso,35


In [112]:
# if we want to add a new column to the dataframe, we can use the following syntax
coffee["price"]= 4.99
coffee

Unnamed: 0,Day,Coffee Type,Units Sold,price
0,Monday,Espresso,25,4.99
1,Monday,Latte,15,4.99
2,Tuesday,Espresso,30,4.99
3,Tuesday,Latte,20,4.99
4,Wednesday,Espresso,35,4.99
5,Wednesday,Latte,25,4.99
6,Thursday,Espresso,40,4.99
7,Thursday,Latte,30,4.99
8,Friday,Espresso,45,4.99
9,Friday,Latte,35,4.99


In [127]:
# what is we want to be more specific like we have one price for each coffee type

coffee["new_price"] = np.where(coffee["Coffee Type"] == "Espresso", 3.99, 4.99)  # sets the price to 3.99 for Espresso and 4.99 for other coffee types
coffee

Unnamed: 0,Day,Coffee Type,Units Sold,new_price
0,Monday,Espresso,25,3.99
1,Monday,Latte,15,4.99
2,Tuesday,Espresso,30,3.99
3,Tuesday,Latte,20,4.99
4,Wednesday,Espresso,35,3.99
5,Wednesday,Latte,25,4.99
6,Thursday,Espresso,40,3.99
7,Thursday,Latte,30,4.99
8,Friday,Espresso,45,3.99
9,Friday,Latte,35,4.99


In [120]:
# to delete comlumn : 

# the point here is that the old dataframe is not modified, it returns a new dataframe with the specified columns dropped
# if we want to modify the old dataframe, we can use the following syntax
coffee.drop(columns=["price"], inplace=True)  # drops the price column from the dataframe and modifies the old dataframe



KeyError: "['price'] not found in axis"

In [121]:
coffee

Unnamed: 0,Day,Coffee Type,Units Sold,new_price
0,Monday,Espresso,25,3.99
1,Monday,Latte,15,4.99
2,Tuesday,Espresso,30,3.99
3,Tuesday,Latte,20,4.99
4,Wednesday,Espresso,35,3.99
5,Wednesday,Latte,25,4.99
6,Thursday,Espresso,40,3.99
7,Thursday,Latte,30,4.99
8,Friday,Espresso,45,3.99
9,Friday,Latte,35,4.99


In [123]:
# take care 
new_coffee = coffee
# this create another object that points to the same dataframe, so if we modify new_coffee, coffee will be modified as well
# to copy really 
new_coffee = coffee.copy()
new_coffee

Unnamed: 0,Day,Coffee Type,Units Sold,new_price
0,Monday,Espresso,25,3.99
1,Monday,Latte,15,4.99
2,Tuesday,Espresso,30,3.99
3,Tuesday,Latte,20,4.99
4,Wednesday,Espresso,35,3.99
5,Wednesday,Latte,25,4.99
6,Thursday,Espresso,40,3.99
7,Thursday,Latte,30,4.99
8,Friday,Espresso,45,3.99
9,Friday,Latte,35,4.99


In [128]:
coffee

Unnamed: 0,Day,Coffee Type,Units Sold,new_price
0,Monday,Espresso,25,3.99
1,Monday,Latte,15,4.99
2,Tuesday,Espresso,30,3.99
3,Tuesday,Latte,20,4.99
4,Wednesday,Espresso,35,3.99
5,Wednesday,Latte,25,4.99
6,Thursday,Espresso,40,3.99
7,Thursday,Latte,30,4.99
8,Friday,Espresso,45,3.99
9,Friday,Latte,35,4.99


In [None]:
# we can aslo 
coffee= coffee.drop(columns=["price"])  # drops the new_price column from the dataframe and modifies the old dataframe

In [130]:
coffee["revenue"]= coffee["Units Sold"] * coffee["new_price"]  # calculates the revenue for each coffee type by multiplying the Units Sold and new_price columns
coffee

Unnamed: 0,Day,Coffee Type,Units Sold,new_price,revenue
0,Monday,Espresso,25,3.99,99.75
1,Monday,Latte,15,4.99,74.85
2,Tuesday,Espresso,30,3.99,119.7
3,Tuesday,Latte,20,4.99,99.8
4,Wednesday,Espresso,35,3.99,139.65
5,Wednesday,Latte,25,4.99,124.75
6,Thursday,Espresso,40,3.99,159.6
7,Thursday,Latte,30,4.99,149.7
8,Friday,Espresso,45,3.99,179.55
9,Friday,Latte,35,4.99,174.65


In [132]:
coffee.rename(columns={"new_price": "price"}, inplace=True)  # renames the new_price column to price in the dataframe
coffee

Unnamed: 0,Day,Coffee Type,Units Sold,price,revenue
0,Monday,Espresso,25,3.99,99.75
1,Monday,Latte,15,4.99,74.85
2,Tuesday,Espresso,30,3.99,119.7
3,Tuesday,Latte,20,4.99,99.8
4,Wednesday,Espresso,35,3.99,139.65
5,Wednesday,Latte,25,4.99,124.75
6,Thursday,Espresso,40,3.99,159.6
7,Thursday,Latte,30,4.99,149.7
8,Friday,Espresso,45,3.99,179.55
9,Friday,Latte,35,4.99,174.65
