In [2]:
import pandas as pd
import numpy as np

## Exercise 1: Series Creation and Basic Operations (Focus: Series)

- **Scenario:** You're tracking the daily average temperature in Dosquebradas for a week.

- **Task:**

    1. Create a Pandas Series named temperatures with the following average temperatures (in Celsius) for 7 days: 24.5, 25.1, 23.9, 26.0, 24.8, 25.5, 24.2.

    2. Assign meaningful labels (e.g., 'Monday', 'Tuesday', ...) as the index for this Series.

    3. Calculate and print the average temperature for the week.

    4. Print all temperatures above 25.0 degrees Celsius.

In [8]:
week_temp= pd.Series([24.5, 25.1, 23.9, 26., 24.8, 25.5, 24.2], index=["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"])
week_temp

Monday       24.5
Tuesday      25.1
Wednesday    23.9
Thursday     26.0
Friday       24.8
Saturday     25.5
Sunday       24.2
dtype: float64

In [10]:
# Tempeture mean
temp_mean= week_temp.mean()
temp_mean

np.float64(24.857142857142858)

In [14]:
# Above 25.0
bool_expr= week_temp > 25.
week_temp[bool_expr]

Tuesday     25.1
Thursday    26.0
Saturday    25.5
dtype: float64

## Exercise 2: DataFrame Creation and Column Selection (Focus: DataFrame & Basic Indexing)

- **Scenario:** You want to record some basic information about a few coffee farms near Dosquebradas.

- **Task:**

1. Create a Pandas DataFrame named coffee_farms with the following data:

   - Farm Name: 'La Aurora', 'El Mirador', 'Bella Vista'

   - Altitude (meters): 1600, 1850, 1700

   - Coffee Variety: 'Castillo', 'Caturra', 'Colombia'

   - Annual Production (kg): 5000, 7500, 6000

2. Set 'Farm Name' as the index of the DataFrame.

3. Select and print only the 'Altitude (meters)' column.

4. Select and print the 'Coffee Variety' and 'Annual Production (kg)' columns for all farms.

In [66]:
# Dataframe creation and index column assigment
coffe_farms= {"Farm Name": ["La Aurora", "El Mirador", "Bella Vista"],
             "Altitude (meters)": [1600, 1850, 1700],
             "Coffee Variety": ["Castillo", "Caturra", "Colombia"],
             "Annual Production (Kg)": [5000, 7500, 6000],}

df_coffee_farms = pd.DataFrame(coffe_farms, index= coffe_farms["Farm Name"])
df_coffee_farms

Unnamed: 0,Farm Name,Altitude (meters),Coffee Variety,Annual Production (Kg)
La Aurora,La Aurora,1600,Castillo,5000
El Mirador,El Mirador,1850,Caturra,7500
Bella Vista,Bella Vista,1700,Colombia,6000


In [67]:
# Print only one column
df_coffee_farms["Altitude (meters)"]

La Aurora      1600
El Mirador     1850
Bella Vista    1700
Name: Altitude (meters), dtype: int64

----
***Note**: pd.loc[] and pd.iloc[] are used to iterate over rows, not columns*

In [68]:
# Print two columns
df_coffee_farms.filter([df.columns[-2], df.columns[-1]], axis= "columns")

Unnamed: 0,Coffee Variety,Annual Production (Kg)
La Aurora,Castillo,5000
El Mirador,Caturra,7500
Bella Vista,Colombia,6000


----
***Note**: pd.filter() is used when filtering by label index or columns*

## Exercise 3: Data Selection with loc and iloc (Focus: Indexing Objects - loc & iloc)

- **Scenario:** You have a DataFrame of tourist attractions in Risaralda.

- **Task:**

1. Create a DataFrame named attractions with the following data:

    - Name: 'Nevado Santa Isabel', 'Termales de Santa Rosa', 'Viaducto César Gaviria Trujillo', 'Parque Consotá'

    - Location: 'Santa Rosa de Cabal', 'Santa Rosa de Cabal', 'Pereira/Dosquebradas', 'Pereira'

    - Type: 'Mountain', 'Hot Springs', 'Bridge', 'Recreational Park'

    - Rating (out of 5): 4.8, 4.5, 4.2, 4.0

2. Using loc, select and print the entire row for 'Termales de Santa Rosa'.

3. Using iloc, select and print the 'Name' and 'Location' for the first two attractions.

4. Using loc, select and print the 'Type' and 'Rating' for 'Nevado Santa Isabel' and 'Parque Consotá'.

In [153]:
# Create DF
attractions = {
    "Name": ['Nevado Santa Isabel', 'Termales de Santa Rosa', 'Viaducto César Gaviria Trujillo', 'Parque Consotá'],
    "Location": ['Santa Rosa de Cabal', 'Santa Rosa de Cabal', 'Pereira/Dosquebradas', 'Pereira'],
    "Type": ['Mountain', 'Hot Springs', 'Bridge', 'Recreational Park'],
    "Rating (out of 5)": [4.8, 4.5, 4.2, 4.0],
}

df_attractions= pd.DataFrame(attractions)
df_attractions

Unnamed: 0,Name,Location,Type,Rating (out of 5)
0,Nevado Santa Isabel,Santa Rosa de Cabal,Mountain,4.8
1,Termales de Santa Rosa,Santa Rosa de Cabal,Hot Springs,4.5
2,Viaducto César Gaviria Trujillo,Pereira/Dosquebradas,Bridge,4.2
3,Parque Consotá,Pereira,Recreational Park,4.0


In [134]:
# Print col with loc[]
df_attractions.loc[1]

Name                 Termales de Santa Rosa
Location                Santa Rosa de Cabal
Type                            Hot Springs
Rating (out of 5)                       4.5
Name: 1, dtype: object

In [139]:
# Print two cols with iloc[]
df_attractions.iloc[:2, [0,1]]    # Fancy indexing

Unnamed: 0,Name,Location
0,Nevado Santa Isabel,Santa Rosa de Cabal
1,Termales de Santa Rosa,Santa Rosa de Cabal


In [156]:
# Print two cols and two rows with loc[]
# First way: setting 'Name' as df index
df_name_as_col = df_attractions.set_index("Name")
df_name_as_col.loc[["Nevado Santa Isabel", "Parque Consotá"], ["Type", "Rating (out of 5)"]]

# Second way: get rows by slice
df_attractions.loc[:1, ["Type", "Rating (out of 5)"]]

Unnamed: 0,Type,Rating (out of 5)
0,Mountain,4.8
1,Hot Springs,4.5


## Exercise 4: Adding/Modifying Data and Boolean Indexing (Focus: Series, DataFrame, Indexing)

- **Scenario:** You're managing a small inventory of traditional Colombian crafts in a shop in Dosquebradas.

- **Task:**

1. Create a DataFrame craft_inventory with the following data:

    - Item: 'Sombrero Vueltiao', 'Mochila Arhuaca', 'Ruana', 'Artesanía en Guadua'

    - Quantity: 15, 8, 12, 20

    - Price (COP): 85000, 120000, 95000, 30000

2. Add a new column named 'In Stock' which is True if 'Quantity' is greater than 10, and False otherwise.

3. Increase the 'Price (COP)' of all items by 10% (imagine a slight price adjustment).

4. Using boolean indexing, print only the items that are currently 'In Stock'.

In [107]:
# Create DF and add new column
craft_inventory= {
    "Item": ['Sombrero Vueltiao', 'Mochila Arhuaca', 'Ruana', 'Artesanía en Guadua'],
    "Quantity": [15, 8, 12, 20],
    "Price (COP)": [85000, 120000, 95000, 30000],
}

df_craft_inventory= pd.DataFrame(craft_inventory)
df_craft_inventory["Stock"] = df_craft_inventory["Quantity"] > 10
df_craft_inventory

Unnamed: 0,Item,Quantity,Price (COP),Stock
0,Sombrero Vueltiao,15,85000,True
1,Mochila Arhuaca,8,120000,False
2,Ruana,12,95000,True
3,Artesanía en Guadua,20,30000,True


In [108]:
# Increse price by 10%
df_craft_inventory["Price (COP)"]+= df_craft_inventory["Price (COP)"]*0.1
df_craft_inventory

Unnamed: 0,Item,Quantity,Price (COP),Stock
0,Sombrero Vueltiao,15,93500.0,True
1,Mochila Arhuaca,8,132000.0,False
2,Ruana,12,104500.0,True
3,Artesanía en Guadua,20,33000.0,True


In [130]:
# Boolean indexing
df_craft_inventory[df_craft_inventory["Stock"] == True]
df_craft_inventory.iloc[:, :-1]

Unnamed: 0,Item,Quantity,Price (COP)
0,Sombrero Vueltiao,15,93500.0
1,Mochila Arhuaca,8,132000.0
2,Ruana,12,104500.0
3,Artesanía en Guadua,20,33000.0


## Exercise 5: Reindexing and Dropping Data (Focus: Indexing Objects)

- Scenario: You have sales data for different fruits in a local market in Dosquebradas, but the order of days is a bit mixed up, and you need to remove some data.

- Task:

1. Create a Series fruit_sales with the following data:

    - Sales (kg): 25, 30, 18, 22, 35

    - Index (Day): 'Wednesday', 'Monday', 'Friday', 'Tuesday', 'Thursday'

2. Reindex the fruit_sales Series to be in chronological order of days of the week (Monday, Tuesday, Wednesday, Thursday, Friday). Fill any missing values with 0.

3. Create a new DataFrame monthly_expenses with two columns: 'Category' and 'Amount (COP)', and three rows: 'Rent', 'Utilities', 'Salaries' with amounts 1500000, 300000, 2500000 respectively.

4. Drop the 'Utilities' row from the monthly_expenses DataFrame and print the result. (Ensure you understand if it's an in-place operation or returns a new DataFrame).

In [164]:
# Creating DF
fruit_sale= {
    "Sales (kg)": [25, 30, 18, 22, 35],
    "Day": ['Wednesday', 'Monday', 'Friday', 'Tuesday', 'Thursday']
}
df_fruit_sale= pd.DataFrame(fruit_sale).set_index("Day")
df_fruit_sale

Unnamed: 0_level_0,Sales (kg)
Day,Unnamed: 1_level_1
Wednesday,25
Monday,30
Friday,18
Tuesday,22
Thursday,35


In [166]:
# Reindexing
df_fruit_sale.reindex(["Monday", 'Tuesday', 'Wednesday', 'Thursday', 'Friday'], fill_value= 0)

Unnamed: 0_level_0,Sales (kg)
Day,Unnamed: 1_level_1
Monday,30
Tuesday,22
Wednesday,25
Thursday,35
Friday,18


----
***Note**: .reindex functions 'reindex' the current index, if indexes are not present in the current index when passed, the values for those indexes in the dataframe will be null (nan)*

In [193]:
# Create new DF
montly_expenses= {
    "Category": ['Rent', 'Utilities', 'Salaries', 'Utilities'],
    "Amount (COP)": [1500000, 300000, 2500000, 1203600],
}

df_montly_expenses= pd.DataFrame(montly_expenses)

# Drop entrie 
# First way: hardcoding
dropped_entry= df_montly_expenses.drop(1)

# Second way: search for indexes
indexes= df_montly_expenses[df_montly_expenses["Category"] == "Utilities"].index
dropped_entry= df_montly_expenses.drop(indexes)
print(indexes)
dropped_entry 


Index([1, 3], dtype='int64')


Unnamed: 0,Category,Amount (COP)
0,Rent,1500000
2,Salaries,2500000


----
***Note**: dropping entries can be in-place if 'inplace' kargument is set to True*