Goal: Convert data from pickle to csv files
Andrew Yew, 2023-10-31

In [1]:
import numpy as np
import pandas as pd
import joblib

In [2]:
data_file = "../11_raw_data/final_processed_df.pkl"

In [3]:
# try 1 use joblib
df = joblib.load(data_file)

ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

Module not found error, most likely due to different versions of pandas being used. [StackOverflow reference](https://stackoverflow.com/questions/75953279/modulenotfounderror-no-module-named-pandas-core-indexes-numeric-using-metaflo)

In [5]:
print(pd.__version__)

2.0.2


Rather than create a new environment with a version of pandas that is less than 2, try using the `read_pickle` method.

In [6]:
# try 2 using read_pickle
df = pd.read_pickle(data_file)

In [7]:
print(type(df))

<class 'numpy.ndarray'>


In [9]:
df[0][0:5]

array(['Corned Beef Roast', 'Stout-Braised Lamb Shanks',
       'Mississippi Chicken', 'Lasagna Flatbread',
       'Prosciutto-Wrapped Pork Tenderloin with Crispy Sage'],
      dtype=object)

The `read_pickle` method was able to read the pickled data file. However, the returned object was not the original DataFrame but a numpy array. Next, try to recreate DataFrame from numpy array, referencing column names from notebook 030.

In [11]:
# Feed the transpose of the data into the DataFrame method
df = pd.DataFrame(df.T)

In [13]:
# Examine the resulting dataframe
print(df.shape)
print("")
print("")

display(df.head())

(33691, 9)




Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Corned Beef Roast,This corned beef roast is easy to prepare and ...,['Preheat the oven to 300 degrees F (150 degre...,Corned beef roast cooked in the oven for five ...,Preheat the oven to 300 degrees F (150 degrees...,1 (5 1/2 pound) corned beef brisket with spice...,8,corned-beef main-dish beef,"Big corned beef fan, Great alternative to boil..."
1,Stout-Braised Lamb Shanks,This hearty Irish lamb shank stew is perfect i...,"['Heat oil in a Dutch oven or large, wide pot ...",This hearty Irish lamb shank stew is perfect i...,"Heat oil in a Dutch oven or large, wide pot ov...",1 tablespoon vegetable oil 4 lamb shanks 1 oni...,4,european uk-and-ireland world-cuisine irish,Awesome. Though I thought it could use a bit (...
2,Mississippi Chicken,This is such a fun and simple recipe with only...,"['Gather ingredients, and preheat the oven to ...",Mississippi chicken recipe uses only 4 ingredi...,"Gather ingredients, and preheat the oven to 35...","2 pounds skinless, boneless chicken breasts 1 ...",4,meat-and-poultry chicken,Very different. Since my hubby loves peperonci...
3,Lasagna Flatbread,A simple lasagna pizza.,['Preheat oven to 375 degrees F (190 degrees C...,Give lasagna a quick and easy pizza makeover b...,Preheat oven to 375 degrees F (190 degrees C)....,1 (15 ounce) container ricotta cheese 1 (8 oun...,6,bread quick-bread,Delicious as pan Lasagna but so much easier on...
4,Prosciutto-Wrapped Pork Tenderloin with Crispy...,Turn your weeknight dinner into a fancy affair...,['Preheat oven to 350 degrees F (175 degrees C...,Prosciutto-wrapped pork tenderloin with crispy...,Preheat oven to 350 degrees F (175 degrees C)....,1 pound pork tenderloin salt and ground black ...,4,tenderloin meat-and-poultry pork,"Not sure where the 20 mins came in, it’s been ..."


The resulting DataFrame does not have column names. Furthermore, the resulting DataFrame has only 9 columns versus the 31 columns as recorded in the original 030 notebook.

Conclusion: better to create an old enviroment with pandas version < 2.

# Downgraded pandas to below version 2.0

In [1]:
import pandas as pd
import numpy as np
import joblib

In [2]:
data_file = "../11_raw_data/final_processed_df.pkl"

In [3]:
pd.__version__

'1.5.3'

In [4]:
# try loading the same file again
df = joblib.load(data_file)

In [5]:
df.head()

Unnamed: 0,recipe_title,recipe_title_wc,average_rating,number_of_ratings,description,description_wc,additional_description,additional_description_wc,description_flavour_text,description_flavour_text_wc,...,sugarContent_g,number_of_servings,cook_time,prep_time,total_time,tags,published_quarter,review,recipe_worth_it,wc_per_instruction
0,Corned Beef Roast,3,4.4,68.0,This corned beef roast is easy to prepare and ...,21,['Preheat the oven to 300 degrees F (150 degre...,81,Corned beef roast cooked in the oven for five ...,26,...,2.0,8,300.0,15.0,315.0,corned-beef main-dish beef,2.0,"Big corned beef fan, Great alternative to boil...",0,27.0
1,Stout-Braised Lamb Shanks,3,4.5,45.0,This hearty Irish lamb shank stew is perfect i...,21,"['Heat oil in a Dutch oven or large, wide pot ...",191,This hearty Irish lamb shank stew is perfect i...,21,...,5.0,4,145.0,25.0,170.0,european uk-and-ireland world-cuisine irish,2.0,Awesome. Though I thought it could use a bit (...,1,40.25
3,Mississippi Chicken,2,4.8,4.0,This is such a fun and simple recipe with only...,33,"['Gather ingredients, and preheat the oven to ...",157,Mississippi chicken recipe uses only 4 ingredi...,19,...,,4,95.0,5.0,100.0,meat-and-poultry chicken,4.0,Very different. Since my hubby loves peperonci...,1,19.5
4,Lasagna Flatbread,2,4.5,42.0,A simple lasagna pizza.,4,['Preheat oven to 375 degrees F (190 degrees C...,99,Give lasagna a quick and easy pizza makeover b...,18,...,6.0,6,15.0,25.0,40.0,bread quick-bread,4.0,Delicious as pan Lasagna but so much easier on...,1,16.2
5,Prosciutto-Wrapped Pork Tenderloin with Crispy...,6,4.8,12.0,Turn your weeknight dinner into a fancy affair...,14,['Preheat oven to 350 degrees F (175 degrees C...,158,Prosciutto-wrapped pork tenderloin with crispy...,26,...,,4,35.0,10.0,50.0,tenderloin meat-and-poultry pork,2.0,"Not sure where the 20 mins came in, it’s been ...",1,26.333333


Success, all 31 columns were loaded correctly. Now, a short loop to convert all data files into CSV.

In [6]:
import os

In [9]:
file_list = os.listdir("../11_raw_data/")
print(file_list)

['recipes_lacking_rating_df.pkl', 'recipe_url_df.pkl', 'processed_df.pkl', 'recipe_label_df.pkl', 'json_data_df.pkl', 'final_processed_df.pkl', 'raw_data_df.pkl']


In [12]:
for file in file_list:
    df = joblib.load("../11_raw_data/" + file)
    df.to_csv("../11_raw_data/" + file[:-4] + ".csv", index = False)

All pickle files were converted to CSV. The pickle files were deleted.