# **(ADD THE NOTEBOOK NAME HERE)**

## Objectives

* Fetch Data from Kaggle and save as raw data under Dataset/Raw folder
* EDA - Use pandas and numpy to better the data through correlations between factors
* Run test for Hypotheses.

## Inputs

* Dataset from Kaggle https://www.kaggle.com/datasets/datota/fruit-and-vegatable-prices-in-uk-2017-2022
* import pandas and numpy libraries - Dataset has text and numerical values

## Outputs

* Create a cleaned CSV file and save to Dataset/Processed
* Use cleaned CSV file for Data Visualisations in a .ipynb  

## Additional Comments

* Plants and flowers columns were removed from Dataset 
* Assigned a new datetime and an on and off season for 'strawberries' 



---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'c:\\Users\\ngubo\\Documents\\vscode-projects\\Capstone_Project_Fruit_Veg_Prices_UK\\jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'c:\\Users\\ngubo\\Documents\\vscode-projects\\Capstone_Project_Fruit_Veg_Prices_UK'

Import CSV file and Libraries to convert the CSV file into a DataFrame

In [4]:
from pathlib import Path

import pandas as pd
import numpy as np

df = pd.read_csv('Dataset/Raw/fruitvegprices-2017_2022.csv')

print(df)

         category           item            variety        date  price  unit
0           fruit         apples  bramleys_seedling  2022-03-11   2.05    kg
1           fruit         apples  coxs_orange_group  2022-03-11   1.22    kg
2           fruit         apples    egremont_russet  2022-03-11   1.14    kg
3           fruit         apples           braeburn  2022-03-11   1.05    kg
4           fruit         apples               gala  2022-03-11   1.03    kg
...           ...            ...                ...         ...    ...   ...
9642  cut_flowers    alstromeria             indoor  2017-11-03   0.27  stem
9643  cut_flowers  chrysanthemum       indoor_spray  2017-11-03   0.22  stem
9644  cut_flowers        lillies           oriental  2017-11-03   0.70  stem
9645  cut_flowers      narcissus             indoor  2017-11-03   0.06  stem
9646   pot_plants       cyclamen              13_cm  2017-11-03   0.75  unit

[9647 rows x 6 columns]


In [5]:
df.info() # Get information about the DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9647 entries, 0 to 9646
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   category  9647 non-null   object 
 1   item      9647 non-null   object 
 2   variety   9647 non-null   object 
 3   date      9647 non-null   object 
 4   price     9647 non-null   float64
 5   unit      9647 non-null   object 
dtypes: float64(1), object(5)
memory usage: 452.3+ KB


In [6]:
df.describe() # Get a summary of the DataFrame

Unnamed: 0,price
count,9647.0
mean,1.528333
std,1.927865
min,0.02
25%,0.54
50%,0.88
75%,1.5
max,17.6


In [7]:
df.dtypes # Check the data types of each column

category     object
item         object
variety      object
date         object
price       float64
unit         object
dtype: object

In [8]:
df.isnull().sum() # Check for missing values in each column 

category    0
item        0
variety     0
date        0
price       0
unit        0
dtype: int64

In [9]:
duplicate = df.duplicated().sum() # Check for duplicate rows in the DataFrame
print(duplicate)

0


Check the Value counts of each column

In [10]:
df['item'].value_counts() # Check the Value counts of each column

item
apples                  1077
cabbage                 1005
lettuce                  666
onion                    543
tomatoes                 445
pears                    312
capsicum                 303
rhubarb                  235
beans                    230
spring_greens            218
cauliflower              218
celeriac                 218
beetroot                 218
pak_choi                 218
curly_kale               217
carrots                  214
swede                    212
parsnips                 211
leeks                    208
turnip                   193
cucumbers                164
brussels_sprouts         143
strawberries             138
coriander                131
spinach_leaf             129
celery                   124
calabrese                116
mixed_babyleaf_salad     115
rocket                   109
raspberries              101
tulips                    97
watercress                92
chinese_leaf              86
courgettes                81
plums    

In [11]:
df['category'].value_counts() # Check the Value counts of each column   

category
vegetable      7264
fruit          1992
cut_flowers     342
pot_plants       49
Name: count, dtype: int64

In [12]:
df['variety'].value_counts() # Check the Value counts of each column

variety
red                  353
all_varieties        279
all                  263
bramleys_seedling    218
little_gem           218
                    ... 
indoor_spray          13
stocks                 7
peony                  7
sweet_williams         6
9_cm                   5
Name: count, Length: 77, dtype: int64

In [13]:
df['date'].value_counts() # Check the Value counts of each column

date
2018-09-28    65
2018-09-14    63
2018-09-07    62
2019-08-30    62
2019-08-16    61
              ..
2021-02-05    30
2021-01-08    30
2020-02-28    29
2020-04-10    29
2020-04-03    28
Name: count, Length: 218, dtype: int64

In [14]:
df['price'].value_counts() # Check the Value counts of each column

price
0.50     129
0.47     128
0.46     124
0.52     107
0.60      99
        ... 
8.61       1
8.08       1
8.09       1
9.24       1
11.00      1
Name: count, Length: 835, dtype: int64

In [15]:
df['unit'].value_counts() # Check the Value counts of each column

unit
kg      7888
head    1150
stem     342
twin     218
unit      49
Name: count, dtype: int64

* 'Category' column contains 'cut_flowers' and 'pot_plants' in the dataset.
* As my strategy is to only look at fruit and veg as a dataset i will be dropping these Rows. 
* Drop Rows that isn't a fruit or veg.

In [16]:
# drop Rows that isn't a fruit or veg.
df = df[df['category'].str.lower() != 'cut_flowers']
df = df[df['category'].str.lower() != 'pot_plants']

# reset index after dropping rows
df = df.reset_index(drop=True)

# check results
print(df['category'].unique())

['fruit' 'vegetable']


New DataFrame shows only Fruit and Vegetable

In [17]:
df['category'].value_counts() # Check the Value counts of each column with the updated DataFrame

category
vegetable    7264
fruit        1992
Name: count, dtype: int64

In [18]:
df['category'] = df['category'].str.lower() # convert all category values to lowercase

fruits_df = df[df['category'] == 'fruit'] # filter the DataFrame to only include rows where category is 'fruit'

print(f'Toatal number of fruit rows: {len(fruits_df)}') # print the number of rows in the fruits DataFrame

Toatal number of fruit rows: 1992


Create a function to focus on a item and see the count (apples)


In [19]:
# Normalize the 'item' column to lowercase
df['item'] = df['item'].str.lower()

# Count occurrences of all items
item_counts = df['item'].value_counts()

# Get the count for apples
apples_count = item_counts.get('apples', 0)  # returns 0 if 'apples' not found

print(f"Apples appear {apples_count} times in the dataset.")

Apples appear 1077 times in the dataset.


Create a Dataset to itemise Fruits

In [20]:
fruit_datasets = {}

# Step 1: Create fruits_df once, normalize 'item' column
fruits_df = df[df['category'] == 'fruit'].copy()
fruits_df['item'] = fruits_df['item'].str.lower()

# Step 2: Get unique fruits
unique_fruits = fruits_df['item'].unique()

# Step 3: Loop through each fruit and create filtered datasets
for fruit in unique_fruits:
    fruit_data = fruits_df[fruits_df['item'] == fruit].copy()
    
    # Convert 'date' to datetime
    fruit_data['date'] = pd.to_datetime(fruit_data['date'])
    
    # Select only needed columns
    fruit_datasets[fruit] = fruit_data[['category', 'variety', 'price', 'unit', 'date']]
    
    print(f"Created dataset for {fruit}: {len(fruit_datasets[fruit])} records")

Created dataset for apples: 1077 records
Created dataset for pears: 312 records
Created dataset for raspberries: 101 records
Created dataset for strawberries: 138 records
Created dataset for blackberries: 60 records
Created dataset for currants: 64 records
Created dataset for blueberries: 73 records
Created dataset for plums: 81 records
Created dataset for cherries: 45 records
Created dataset for gooseberries: 41 records


In [21]:
# Check the fruit with the highest count 'apples'
apples_df = fruit_datasets.get('apples')
print(apples_df)

     category            variety  price unit       date
0       fruit  bramleys_seedling   2.05   kg 2022-03-11
1       fruit  coxs_orange_group   1.22   kg 2022-03-11
2       fruit    egremont_russet   1.14   kg 2022-03-11
3       fruit           braeburn   1.05   kg 2022-03-11
4       fruit               gala   1.03   kg 2022-03-11
...       ...                ...    ...  ...        ...
9209    fruit    egremont_russet   0.82   kg 2017-11-03
9210    fruit           braeburn   0.64   kg 2017-11-03
9211    fruit               gala   0.79   kg 2017-11-03
9212    fruit   other_mid_season   0.76   kg 2017-11-03
9213    fruit  other_late_season   0.75   kg 2017-11-03

[1077 rows x 5 columns]


Check unique varieties on focused item 'apples'

In [22]:
df[df['item'] == 'apples']['variety'].unique()

array(['bramleys_seedling', 'coxs_orange_group', 'egremont_russet',
       'braeburn', 'gala', 'other_late_season', 'other_mid_season',
       'other_early_season'], dtype=object)

Average price per variety on item 'apples'

In [23]:
# Analyze the mean price of different apple varieties
(
    df[df['item'] == 'apples']
    .groupby('variety')
    .agg(mean_price=('price', 'mean'), unit=('unit', 'first'))
    .sort_values('mean_price')
    .assign(mean_price=lambda d: d['mean_price'].map('£{:.2f}'.format)) # Format mean_price as UK currency GBP(£)
)

Unnamed: 0_level_0,mean_price,unit
variety,Unnamed: 1_level_1,Unnamed: 2_level_1
other_mid_season,£0.81,kg
braeburn,£0.90,kg
other_early_season,£0.91,kg
other_late_season,£0.95,kg
gala,£0.95,kg
coxs_orange_group,£0.98,kg
egremont_russet,£1.16,kg
bramleys_seedling,£1.22,kg


Use std to see the volatility of off-season prices 

In [24]:
# Analyze the standard deviation of prices for different apple varieties
(
    df[df['item'] == 'apples']
    .groupby('variety')
    .agg(std_price=('price', 'std'), unit=('unit', 'first'))
    .sort_values('std_price')
    .assign(std_price=lambda d: d['std_price'].map('£{:.2f}'.format))
)

Unnamed: 0_level_0,std_price,unit
variety,Unnamed: 1_level_1,Unnamed: 2_level_1
gala,£0.15,kg
braeburn,£0.15,kg
other_mid_season,£0.21,kg
egremont_russet,£0.22,kg
other_early_season,£0.23,kg
coxs_orange_group,£0.23,kg
other_late_season,£0.29,kg
bramleys_seedling,£0.46,kg


Create a Dataframe showing both mean and std price for better understanding through different varieties of apples and it's off-seasons

In [25]:
(
    df[df['item'] == 'apples']
    .groupby('variety')
    .agg(
    mean_price=('price', 'mean'),
    std_price=('price', 'std'),
    unit=('unit', 'first')
)
    .sort_values(['mean_price', 'std_price'])
    .assign(
    mean_price=lambda d: d['mean_price'].map('£{:.2f}'.format),
    std_price=lambda d: d['std_price'].map('£{:.2f}'.format)
)
)

Unnamed: 0_level_0,mean_price,std_price,unit
variety,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
other_mid_season,£0.81,£0.21,kg
braeburn,£0.90,£0.15,kg
other_early_season,£0.91,£0.23,kg
other_late_season,£0.95,£0.29,kg
gala,£0.95,£0.15,kg
coxs_orange_group,£0.98,£0.23,kg
egremont_russet,£1.16,£0.22,kg
bramleys_seedling,£1.22,£0.46,kg


Braeburn and Gala are relatively affordable — probably due to wide availability and longer storage life.

Cox's Orange Group and Egremont Russet are a bit more niche — the prices reflect that.

Bramley's Seedling (mostly used for cooking) is the most expensive on average — could reflect lower supply or strong demand for cooking apples.



Create a Dataset to itemise Vegetables 

In [26]:
vegetable_datasets = {}

# Step 1: Create vegetables_df once, normalize 'item' column
vegetables_df = df[df['category'] == 'vegetable'].copy()
vegetables_df['item'] = vegetables_df['item'].str.lower()

# Step 2: Get unique vegetables
unique_vegetables = vegetables_df['item'].unique()

# Step 3: Loop through each vegetable and create filtered datasets
for vegetable in unique_vegetables:
    vegetable_data = vegetables_df[vegetables_df['item'] == vegetable].copy()
    
    # Convert 'date' to datetime
    vegetable_data['date'] = pd.to_datetime(vegetable_data['date'])

    # Select only needed columns
    vegetable_datasets[vegetable] = vegetable_data[['category', 'variety', 'price', 'unit', 'date']]

    print(f"Created dataset for {vegetable}: {len(vegetable_datasets[vegetable])} records")

Created dataset for beetroot: 218 records
Created dataset for brussels_sprouts: 143 records
Created dataset for pak_choi: 218 records
Created dataset for curly_kale: 217 records
Created dataset for cabbage: 1005 records
Created dataset for spring_greens: 218 records
Created dataset for carrots: 214 records
Created dataset for cauliflower: 218 records
Created dataset for celeriac: 218 records
Created dataset for cucumbers: 164 records
Created dataset for leeks: 208 records
Created dataset for lettuce: 666 records
Created dataset for onion: 543 records
Created dataset for swede: 212 records
Created dataset for turnip: 193 records
Created dataset for parsnips: 211 records
Created dataset for rhubarb: 235 records
Created dataset for capsicum: 303 records
Created dataset for chinese_leaf: 86 records
Created dataset for celery: 124 records
Created dataset for tomatoes: 445 records
Created dataset for coriander: 131 records
Created dataset for spinach_leaf: 129 records
Created dataset for cal

Transform the Dataset into Descending order from Highest to lowest count

In [27]:
veg_counts_df = vegetables_df['item'].value_counts().reset_index() # Transform the Dataset into Descending order from Highest to lowest count
veg_counts_df.columns = ['vegetable', 'count'] # Rename the columns for clarity

print(veg_counts_df)   

               vegetable  count
0                cabbage   1005
1                lettuce    666
2                  onion    543
3               tomatoes    445
4               capsicum    303
5                rhubarb    235
6                  beans    230
7               celeriac    218
8               beetroot    218
9            cauliflower    218
10              pak_choi    218
11         spring_greens    218
12            curly_kale    217
13               carrots    214
14                 swede    212
15              parsnips    211
16                 leeks    208
17                turnip    193
18             cucumbers    164
19      brussels_sprouts    143
20             coriander    131
21          spinach_leaf    129
22                celery    124
23             calabrese    116
24  mixed_babyleaf_salad    115
25                rocket    109
26            watercress     92
27          chinese_leaf     86
28            courgettes     81
29                  peas     80
30      

In [28]:
# check the vegetable with the highest count 'cabbage'
cabbage_df = vegetable_datasets.get('cabbage')
print(cabbage_df)

       category                variety  price  unit       date
12    vegetable                    red   0.59    kg 2022-03-11
13    vegetable                  savoy   0.51  head 2022-03-11
15    vegetable                  white   0.41    kg 2022-03-11
16    vegetable      round_green_other   0.46  head 2022-03-11
43    vegetable                    red   0.56    kg 2022-03-04
...         ...                    ...    ...   ...        ...
9223  vegetable                    red   0.42    kg 2017-11-03
9224  vegetable                  savoy   0.40  head 2017-11-03
9226  vegetable  summer_autumn_pointed   0.49    kg 2017-11-03
9227  vegetable                  white   0.34    kg 2017-11-03
9228  vegetable      round_green_other   0.39  head 2017-11-03

[1005 rows x 5 columns]


In [29]:
df[df['item'] == 'cabbage']['variety'].unique() # get unique varieties of cabbage

array(['red', 'savoy', 'white', 'round_green_other',
       'summer_autumn_pointed'], dtype=object)

In [30]:
# Analyze the mean price of different cabbage varieties
(
    df[df['item'] == 'cabbage']
    .groupby('variety')
    .agg(mean_price=('price', 'mean'), unit=('unit', 'first'))
    .sort_values('mean_price')
    .assign(mean_price=lambda d: d['mean_price'].map('£{:.2f}'.format))
)

Unnamed: 0_level_0,mean_price,unit
variety,Unnamed: 1_level_1,Unnamed: 2_level_1
white,£0.48,kg
round_green_other,£0.50,head
savoy,£0.52,head
red,£0.53,kg
summer_autumn_pointed,£0.60,kg


In [31]:
# Analyze the standard deviation of prices for different cabbage varieties
(
    df[df['item'] == 'cabbage']
    .groupby('variety')
    .agg(std_price=('price', 'std'), unit=('unit', 'first'))
    .sort_values('std_price')
    .assign(std_price=lambda d: d['std_price'].map('£{:.2f}'.format))
)

Unnamed: 0_level_0,std_price,unit
variety,Unnamed: 1_level_1,Unnamed: 2_level_1
summer_autumn_pointed,£0.08,kg
round_green_other,£0.08,head
savoy,£0.11,head
white,£0.14,kg
red,£0.16,kg


Like with 'apples' the same was done with 'cabbage' for better understanding on price fluctuations

In [32]:
(
    df[df['item'] == 'cabbage']
    .groupby('variety')
    .agg(
    mean_price=('price', 'mean'),
    std_price=('price', 'std'),
    unit=('unit', 'first')
)
    .sort_values(['mean_price', 'std_price'])
    .assign(
    mean_price=lambda d: d['mean_price'].map('£{:.2f}'.format),
    std_price=lambda d: d['std_price'].map('£{:.2f}'.format)
)
)

Unnamed: 0_level_0,mean_price,std_price,unit
variety,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
white,£0.48,£0.14,kg
round_green_other,£0.50,£0.08,head
savoy,£0.52,£0.11,head
red,£0.53,£0.16,kg
summer_autumn_pointed,£0.60,£0.08,kg


Round Green Other and Summer Autumn Pointed are the most price-stable, with the lowest standard deviation (only £0.08).

White cabbage is the cheapest overall, but slightly more volatile than the most stable varieties.

Red cabbage has the highest volatility, though still relatively low.

Round green other and Savoy cabbages are priced per head, others per kg — this could affect how pricing comparisons are interpreted for sales or packaging.


In [33]:
df[df['item'] == 'strawberries']['variety'].unique() # get unique varieties of strawberries

array(['strawberries'], dtype=object)

* Search online for information on the seasonality on 'strawberries'
* On season include May to August but June and July are considered the peak season in the UK 
* Create temporary mapping for 'on' and 'off' seasons for 'strawberries'

In [34]:
on_season_months = [5, 6, 7, 8]      # May to August
off_season_months = [1, 2, 3, 10, 11, 12]  # Jan–Mar, Oct–Dec

In [None]:
# Convert date column temporarily for season analysis
df_temp = df.copy()
df_temp['date'] = pd.to_datetime(df_temp['date'])
df_temp['month'] = df_temp['date'].dt.month
df_temp['year'] = df_temp['date'].dt.year
df_temp['Season'] = np.where(df_temp['month'].isin(on_season_months), 'On-Season',
                             np.where(df_temp['month'].isin(off_season_months), 'Off-Season', 'Shoulder-Season'))

# Filter for one item (e.g. Strawberries)
strawberries = df_temp[df_temp['item'] == 'strawberries']

In [37]:
(
    df_temp[df_temp['item'] == 'strawberries']
    .groupby('Season')
    .agg(mean_price=('price', 'mean'), unit=('unit', 'first'))
    .sort_values('mean_price')
    .assign(mean_price=lambda d: d['mean_price'].map('£{:.2f}'.format))
)

Unnamed: 0_level_0,mean_price,unit
Season,Unnamed: 1_level_1,Unnamed: 2_level_1
On-Season,£3.10,kg
Off-Season,£3.94,kg
Shoulder-Season,£4.73,kg


In [38]:
(
    df_temp[df_temp['item'] == 'strawberries']
    .groupby('Season')
    .agg(std_price=('price', 'std'), unit=('unit', 'first'))
    .sort_values('std_price')
    .assign(std_price=lambda d: d['std_price'].map('£{:.2f}'.format))
)

Unnamed: 0_level_0,std_price,unit
Season,Unnamed: 1_level_1,Unnamed: 2_level_1
On-Season,£1.36,kg
Shoulder-Season,£1.99,kg
Off-Season,£2.18,kg


On-season (£1.36): Prices are more stable — likely due to local supply being strong.

Off-season (£2.18): Higher volatility, possibly from imports, storage issues, or supply chain instability.

Shoulder-season (£1.99): Transitional period — prices vary as supply starts or winds down.

# Further Analysis 

On-season strawberries are the most stable — prices are predictable when local UK harvest and supply is at it's fullest.

Off-season strawberries have the highest price volatility — this makes sense because:
- According to online sources they're imported from places like Spain, Morocco, Egypt or the Netherlands 
- Subject to higher transport, climate, and supply chain variability

Shoulder-season (start/end of the growing period) is also unstable — prices jump depending on early/late harvests or limited supply.


In [39]:
(
    df_temp[df_temp['item'] == 'strawberries']
    .groupby('Season')
    .agg(
    mean_price=('price', 'mean'),
    std_price=('price', 'std'),
    unit=('unit', 'first')
)
    .sort_values(['mean_price', 'std_price'])
    .assign(
    mean_price=lambda d: d['mean_price'].map('£{:.2f}'.format),
    std_price=lambda d: d['std_price'].map('£{:.2f}'.format)
)
)

Unnamed: 0_level_0,mean_price,std_price,unit
Season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
On-Season,£3.10,£1.36,kg
Off-Season,£3.94,£2.18,kg
Shoulder-Season,£4.73,£1.99,kg


# Key insights

1. On-season: Best balance of price and stability

- Mean price: Lowest (£3.10/kg)

- Volatility: Lowest (std £1.36)

- Why? Local harvest, consistent supply, lower transport costs, no tariffs.

Best time for consumers and retailers — lower prices and more predictability.


2. Off-season: Higher prices, highest volatility

- Mean price: Mid-range (£3.94/kg)

- Volatility: Highest (std £2.18)

- Why?

Imported supply (higher costs, more disruption)

Prices fluctuate based on origin, weather, fuel costs, etc.

More expensive and risky for retailers — can affect margins and availability.


3. Shoulder-season: Most expensive, still volatile

- Mean price: Highest (£4.73/kg)

- Volatility: High (std £1.99)

- Why?

Supply is limited and inconsistent — early or late season crops

Might be a mix of local + imported, which adds variability

Prices can spike due to novelty or limited stock

Consumers pay the most here, even though volatility is slightly less than off-season.



Summary: What Does It Say About Strawberry Prices?

Strawberries are highly seasonal.
As they move out of the UK growing window, prices become increasingly unstable — especially in the off-season, where external factors dominate. This suggests significant pricing risk for both suppliers and consumers outside peak months.

Remove the seasonal tag and keep the DataFrame with just Fruit & Veg 

In [41]:
print(df)

       category          item            variety        date  price  unit
0         fruit        apples  bramleys_seedling  2022-03-11   2.05    kg
1         fruit        apples  coxs_orange_group  2022-03-11   1.22    kg
2         fruit        apples    egremont_russet  2022-03-11   1.14    kg
3         fruit        apples           braeburn  2022-03-11   1.05    kg
4         fruit        apples               gala  2022-03-11   1.03    kg
...         ...           ...                ...         ...    ...   ...
9251  vegetable  spinach_leaf      loose_bunches  2017-11-03   0.76    kg
9252  vegetable     sweetcorn          sweetcorn  2017-11-03   0.18  head
9253  vegetable      tomatoes              round  2017-11-03   0.97    kg
9254  vegetable      tomatoes               vine  2017-11-03   1.54    kg
9255  vegetable    watercress       pillow_packs  2017-11-03   8.04    kg

[9256 rows x 6 columns]


In [42]:
df.to_csv('Dataset/Processed/fruitvegprices-2017_2022-cleaned.csv', index=False)   

# Section 1

Section 1 content

---

# Section 2

Section 2 content

---

import plotly.graph_objects as go

# Create interactive line plot for VS Code
fig = go.Figure(data=go.Scatter(x=cabbage_df['date'], y=cabbage_df['price'], mode='lines+markers'))
fig.update_layout(
    title='Cabbage Price Over Time', 
    xaxis_title='Date', 
    yaxis_title='Price',
    width=800,
    height=500
)
fig.show()

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---