# Форматы данных (1)

Материалы:
* Макрушин С.В. "Лекция 4: Форматы данных"
* https://docs.python.org/3/library/json.html
* https://docs.python.org/3/library/pickle.html
* https://www.crummy.com/software/BeautifulSoup/bs4/doc.ru/bs4ru.html
* Уэс Маккини. Python и анализ данных

## Задачи для совместного разбора

In [1]:
import json

In [2]:
from bs4 import BeautifulSoup

1. Вывести все адреса электронной почты, содержащиеся в адресной книге `addres-book.json`

In [3]:
with open("./data/addres-book.json", "r") as file:
    data = json.load(file)

[item["email"] for item in data]

['faina@mail.ru', 'robert@mail.ru']

2. Вывести телефоны, содержащиеся в адресной книге `addres-book.json`

In [4]:
with open("./data/addres-book.json", "r") as file:
    data = json.load(file)

[[subitem["phone"] for subitem in item["phones"]] for item in data]

[['232-19-55', '+7 (916) 232-19-55'], ['111-19-55', '+7 (916) 445-19-55']]

3. По данным из файла `addres-book-q.xml` сформировать список словарей с телефонами каждого из людей. 

In [5]:
content = open("./data/addres-book-q.xml","r").read()
soup = BeautifulSoup(content,'xml')

result_list = []
for address in soup.find_all('address'):
    user_name = address.find("name").get_text()
    for phone in address.find_all('phone'):
        phone_str = phone.get_text()
        result_list.append({user_name: phone_str})

result_list

[{'Aicha Barki': '+ (213) 6150 4015'},
 {'Aicha Barki': '+ (213) 2173 5247'},
 {'Francisco Domingos': '+ (244-2) 325 023'},
 {'Francisco Domingos': '+ (244-2) 325 023'},
 {'Maria Luisa': '+ (244) 4232 2836'},
 {'Abraao Chanda': '+ (244-2) 325 023'},
 {'Abraao Chanda': '+ (244-2) 325 023'},
 {'Beatriz Busaniche': '+ (54-11) 4784 1159'},
 {'Francesca Beddie': '+ (61-2) 6274 9500'},
 {'Francesca Beddie': '+ (61-2) 6274 9513'},
 {'Graham John Smith': '+ (61-3) 9807 4702'}]

## Лабораторная работа №4

### JSON

1.1 Считайте файл `contributors_sample.json`. Воспользовавшись модулем `json`, преобразуйте содержимое файла в соответствующие объекты python. Выведите на экран информацию о первых 3 пользователях.

In [6]:
from typing import List, Dict, Optional, Tuple
import pandas as pd
from datetime import datetime

In [7]:
class User:
    USERS_LIST = []
    """Класс пользователя"""
    
    def __init__(self, username: str, name: str, sex: str, address : str, mail: str, jobs: List[str], id: int) -> None:
        self._username = username
        self._name = name
        self._sex = sex
        self._address = address
        self._mail = mail
        self._jobs = jobs
        self._id = id
        self.USERS_LIST.append(self)

    @staticmethod
    def find_user(search_username: str):
        for user in User.USERS_LIST:
            if user.username == search_username:
                return user
        raise ValueError(f"Пользователь с username {search_username} отсутствует в БД")
    
    @staticmethod
    def count_male_female_users() -> Tuple[int]:
        data_dict = {"F": 0, "M": 0}
        for user in User.USERS_LIST:
            data_dict[user.sex] += 1
        return data_dict["M"], data_dict["F"]
    
    @staticmethod
    def df_gen() -> pd.DataFrame:
        data_list = []
        for user in User.USERS_LIST:
            data_list.append({"id": user.id, "username": user.username, "sex": user.sex})
        result = pd.DataFrame.from_dict(data_list)
        return result.set_index("id")
    
    def is_id_exists(search_id: int) -> bool:
        for user in User.USERS_LIST:
            if user.id == search_id:
                return True
        return False
    
    @property
    def id(self):
        return self._id

    @property
    def mail_domain(self):
        domain = self._mail.split("@")
        return domain[1]
    
    @property
    def username(self):
        return self._username
    
    @property
    def sex(self):
        return self._sex
    
    def __str__(self) -> str:
        jobs_str = ", ".join(self._jobs)
        return f"----\nId: {self._id}\nUsername: {self._username}\nName: {self._name}\nSex: {self._sex}\naddress: {self._address}\nmail: {self._mail}\njobs: {jobs_str}\n" 

In [8]:
with open("./data/contributors_sample.json", "r") as file:
    data = json.load(file)

users_list = [User(**item) for item in data]
for user in users_list[:3]:
    print(user)

----
Id: 35193
Username: uhebert
Name: Lindsey Nguyen
Sex: F
address: 01261 Cameron Spring
Taylorfurt, AK 97791
mail: jsalazar@gmail.com
jobs: Energy engineer, Engineer, site, Environmental health practitioner, Biomedical scientist, Jewellery designer

----
Id: 91970
Username: vickitaylor
Name: Cheryl Lewis
Sex: F
address: 66992 Welch Brooks
Marshallshire, ID 56004
mail: bhudson@gmail.com
jobs: Music therapist, Volunteer coordinator, Designer, interior/spatial

----
Id: 1848091
Username: sheilaadams
Name: Julia Allen
Sex: F
address: Unit 1632 Box 2971
DPO AE 23297
mail: darren44@yahoo.com
jobs: Management consultant, Engineer, structural, Lecturer, higher education, Theatre manager, Designer, textile



1.2 Выведите уникальные почтовые домены, содержащиеся в почтовых адресах людей

In [9]:
domains = set([user.mail_domain for user in users_list])
domains

{'gmail.com', 'hotmail.com', 'yahoo.com'}

1.3 Напишите функцию, которая по `username` ищет человека и выводит информацию о нем. Если пользователь с заданным `username` отсутствует, возбудите исключение `ValueError`

In [10]:
print(User.find_user("uhebert"))

----
Id: 35193
Username: uhebert
Name: Lindsey Nguyen
Sex: F
address: 01261 Cameron Spring
Taylorfurt, AK 97791
mail: jsalazar@gmail.com
jobs: Energy engineer, Engineer, site, Environmental health practitioner, Biomedical scientist, Jewellery designer



In [11]:
try:
    print(User.find_user("meow"))
except ValueError as e:
    print(e)

Пользователь с username meow отсутствует в БД


1.4 Посчитайте, сколько мужчин и женщин присутсвует в этом наборе данных.

In [12]:
male, female = User.count_male_female_users()
print(f"Мужчин: {male}")
print(f"Женщин: {female}")

Мужчин: 2064
Женщин: 2136


1.5 Создайте `pd.DataFrame` `contributors`, имеющий столбцы `id`, `username` и `sex`.

In [13]:
contributors = User.df_gen()
contributors

Unnamed: 0_level_0,username,sex
id,Unnamed: 1_level_1,Unnamed: 2_level_1
35193,uhebert,F
91970,vickitaylor,F
1848091,sheilaadams,F
50969,nicole82,F
676820,jean67,M
...,...,...
423555,stevenspencer,F
35251,rwilliams,M
135887,lmartinez,F
212714,brendahill,M


1.6 Загрузите данные из файла `recipes_sample.csv` (__ЛР2__) в таблицу `recipes`. Объедините `recipes` с таблицей `contributors` с сохранением строк в том случае, если информация о человеке отсутствует в JSON-файле. Для скольких человек информация отсутствует? 

In [14]:
recipes = pd.read_csv("./data/recipes_sample.csv", sep=",", parse_dates=['submitted'])
recipes = recipes.set_index("id")
recipes

Unnamed: 0_level_0,name,minutes,contributor_id,submitted,n_steps,description,n_ingredients
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
44123,george s at the cove black bean soup,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0
67664,healthy for them yogurt popsicles,10,91970,2003-07-26,,my children and their friends ask for my homem...,
38798,i can t believe it s spinach,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0
35173,italian gut busters,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,
84797,love is in the air beef fondue sauces,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...
267661,zurie s holey rustic olive and cheddar bread,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0
386977,zwetschgenkuchen bavarian plum cake,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0
103312,zwiebelkuchen southwest german onion cake,75,161745,2004-11-03,,this is a traditional late summer early fall s...,
486161,zydeco soup,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,


In [15]:
contributors

Unnamed: 0_level_0,username,sex
id,Unnamed: 1_level_1,Unnamed: 2_level_1
35193,uhebert,F
91970,vickitaylor,F
1848091,sheilaadams,F
50969,nicole82,F
676820,jean67,M
...,...,...
423555,stevenspencer,F
35251,rwilliams,M
135887,lmartinez,F
212714,brendahill,M


1.6 Загрузите данные из файла `recipes_sample.csv` (__ЛР2__) в таблицу `recipes`. Объедините `recipes` с таблицей `contributors` с сохранением строк в том случае, если информация о человеке отсутствует в JSON-файле. Для скольких человек информация отсутствует? 

In [16]:
buf = pd.merge(recipes, contributors, left_on=['contributor_id'], right_on=['id'], how="left")
my_groupby = buf.groupby("contributor_id")["username"].count()
result = my_groupby[my_groupby == 0]
result.shape[0]

4204

In [17]:
my_groupby

contributor_id
1530           0
1533           0
1534           0
1535          40
1538           0
              ..
2001968497     2
2002059754     1
2002234079     0
2002234259     0
2002247884     1
Name: username, Length: 8404, dtype: int64

In [18]:
result

contributor_id
1530          0
1533          0
1534          0
1538          0
1540          0
             ..
2001624050    0
2001712841    0
2001722259    0
2002234079    0
2002234259    0
Name: username, Length: 4204, dtype: int64

In [19]:
#Доп проверка

for index, value in result.items():
    #print(f"Index : {index}, Value : {value}")
    assert False == User.is_id_exists(index)

### pickle

In [20]:
import pickle

2.1 На основе файла `contributors_sample.json` создайте словарь следующего вида: 
```
{
    должность: [список username людей, занимавших эту должность]
}
```

In [21]:
with open("data/contributors_sample.json", "r") as file:
    contributors_sample = json.load(file)

result_dict = {}
for user in contributors_sample:
    for job in user["jobs"]:
        if job not in result_dict:
            result_dict[job] = []
        result_dict[job].append(user["username"])
        
result_dict

{'Energy engineer': ['uhebert',
  'annmoore',
  'garysilva',
  'martinezashley',
  'sextonsheila',
  'pjames',
  'smithjonathan',
  'wardjames',
  'cwheeler',
  'ucarlson',
  'robert71',
  'johnsontheresa',
  'amanda41',
  'stacey47',
  'timothynelson',
  'timothynelson',
  'rogersmichael',
  'melissa94',
  'wmcdaniel',
  'charles74',
  'smithjennifer',
  'clintonjones'],
 'Engineer, site': ['uhebert',
  'nancy12',
  'andrea03',
  'catherineross',
  'wesley32',
  'natalieross',
  'rossdoris',
  'christophersmith',
  'dbooker',
  'ericarobertson',
  'trantricia',
  'tpugh',
  'jasonvelez',
  'samantha36',
  'brandidaniels',
  'tenglish',
  'reyesbrett',
  'austin18',
  'vjohnson',
  'zmejia',
  'daniel04',
  'cynthia20',
  'morgan15',
  'avaldez',
  'jessica92',
  'laurieholloway',
  'baileyvictoria'],
 'Environmental health practitioner': ['uhebert',
  'jonathanchristian',
  'xjohnson',
  'dsmith',
  'james01',
  'nancytaylor',
  'ztaylor',
  'andrewwoods',
  'susan54',
  'fmaldonado',

2.2 Сохраните результаты в файл `job_people.pickle` и в файл `job_people.json` с использованием форматов pickle и JSON соответственно. Сравните объемы получившихся файлов. При сохранении в JSON укажите аргумент `indent`.

In [22]:
with open('./out/job_people.pickle', 'wb') as file:
    pickle.dump(result_dict, file)

In [23]:
with open('./out/job_people.json', 'w') as file:
    json.dump(result_dict, file, indent=4)

2.3 Считайте файл `job_people.pickle` и продемонстрируйте, что данные считались корректно. 

In [24]:
with open('./out/job_people.pickle', 'rb') as file:
    test_dict = pickle.load(file)

In [25]:
assert test_dict == result_dict

In [26]:
with open('./out/job_people.json', 'r') as file:
    test_dict = json.load(file)

In [27]:
assert test_dict == result_dict

### XML

3.1 По данным файла `steps_sample.xml` сформируйте словарь с шагами по каждому рецепту вида `{id_рецепта: ["шаг1", "шаг2"]}`. Сохраните этот словарь в файл `steps_sample.json`

In [28]:
content = open("./data/steps_sample.xml","r").read()
soup = BeautifulSoup(content,'xml')


result_dict = {}
for recipe in soup.find_all('recipe'):
    recipe_id = int(recipe.find("id").get_text())
    result_dict[recipe_id] = []
    
    for step in recipe.find_all("step"):
        text = step.get_text()
        result_dict[recipe_id].append(text)

result_dict

{44123: ['in 1 / 4 cup butter , saute carrots , onion , celery and broccoli stems for 5 minutes',
  'add thyme , oregano and basil',
  'saute 5 minutes more',
  'add wine and deglaze pan',
  'add hot chicken stock and reduce by one-third',
  'add worcestershire sauce , tabasco , smoked chicken , beans and broccoli florets',
  'simmer 5 minutes',
  'add cream , simmer 5 minutes more and season to taste',
  'drop in remaining butter , piece by piece , stirring until melted and serve immediately',
  'smoked chicken: on a covered grill , slightly smoke boneless chicken , cooking to medium rare',
  'chef meskan uses applewood chips and does not allow the grill to become too hot'],
 67664: ['mix all the ingredients using a blender',
  'pour into popsicle molds',
  'freeze and enjoy !'],
 38798: ['combine all ingredients in a large bowl and mix well',
  'shape into one-inch balls',
  'cover and refrigerate or freeze until ready to bake',
  'preheat oven to 350 degrees',
  'place on ungreased 

3.2 По данным файла `steps_sample.xml` сформируйте словарь следующего вида: `кол-во_шагов_в_рецепте: [список_id_рецептов]`

In [29]:
content = open("./data/steps_sample.xml","r").read()
soup = BeautifulSoup(content,'xml')


result_dict = {}
for recipe in soup.find_all('recipe'):
    recipe_id = int(recipe.find("id").get_text())
    steps_count = len(recipe.find_all("step"))
    
    if steps_count not in result_dict:
        result_dict[steps_count] = []
    result_dict[steps_count].append(recipe_id)
        
result_dict

{11: [44123,
  302399,
  375376,
  140610,
  374703,
  111198,
  257111,
  432661,
  114204,
  63069,
  165096,
  33947,
  250024,
  330512,
  315233,
  25259,
  331174,
  407621,
  263019,
  112853,
  383729,
  13709,
  336166,
  143286,
  387284,
  290003,
  370746,
  34833,
  11975,
  426211,
  373582,
  88845,
  456968,
  14149,
  507927,
  73602,
  91981,
  175109,
  390933,
  193208,
  83893,
  243008,
  259789,
  303926,
  410920,
  446605,
  32571,
  74419,
  308056,
  78497,
  111963,
  361181,
  302640,
  356655,
  53743,
  57771,
  420689,
  74520,
  50851,
  176277,
  266814,
  27897,
  189207,
  138771,
  279797,
  177831,
  32515,
  256842,
  95295,
  383349,
  109791,
  332641,
  116993,
  173126,
  187872,
  177681,
  249006,
  314834,
  283033,
  117084,
  49202,
  284916,
  247657,
  313162,
  424727,
  227557,
  431305,
  263038,
  439979,
  443041,
  241042,
  258779,
  66965,
  200503,
  109597,
  503121,
  290595,
  401175,
  169146,
  282228,
  316435,
  248582,


3.3 Получите список рецептов, в этапах выполнения которых есть информация о времени (часы или минуты). Для отбора подходящих рецептов обратите внимание на атрибуты соответствующих тэгов.

In [30]:
content = open("./data/steps_sample.xml","r").read()
soup = BeautifulSoup(content,'xml')

result_set = set()
for recipe in soup.find_all('recipe'):
    recipe_id = int(recipe.find("id").get_text())
    for recipe in recipe.findAll("step"):
        if recipe.has_attr('has_hours') or recipe.has_attr('has_minutes'):
            result_set.add(recipe_id)

print(len(result_set))
list(result_set)

23469


[524289,
 131082,
 131087,
 131090,
 262166,
 131096,
 131107,
 131109,
 262188,
 48,
 55,
 262207,
 66,
 131138,
 262214,
 393286,
 262219,
 131149,
 91,
 94,
 131173,
 131185,
 393340,
 131206,
 262285,
 262293,
 153,
 393375,
 524456,
 176,
 181,
 262325,
 262327,
 186,
 262330,
 393409,
 262340,
 203,
 131275,
 262348,
 524495,
 393433,
 131295,
 224,
 393448,
 131311,
 240,
 262386,
 246,
 131322,
 262400,
 393496,
 288,
 289,
 131364,
 131385,
 314,
 318,
 321,
 393538,
 131408,
 337,
 393554,
 131423,
 131429,
 360,
 373,
 378,
 379,
 381,
 262526,
 262531,
 524675,
 131461,
 393609,
 131471,
 262550,
 131483,
 262564,
 393637,
 131497,
 262577,
 262585,
 393658,
 445,
 524744,
 262605,
 465,
 469,
 131542,
 262625,
 262627,
 131558,
 131567,
 524789,
 502,
 504,
 131587,
 131602,
 393750,
 544,
 131618,
 131623,
 393772,
 561,
 131637,
 393791,
 393794,
 393796,
 393801,
 131658,
 587,
 393807,
 131664,
 131671,
 262761,
 131693,
 393846,
 262777,
 131706,
 635,
 393854,
 26278

3.4 Загрузите данные из файла `recipes_sample.csv` (__ЛР2__) в таблицу `recipes`. Для строк, которые содержат пропуски в столбце `n_steps`, заполните этот столбец на основе файла  `steps_sample.xml`. Строки, в которых столбец `n_steps` заполнен, оставьте без изменений.

In [31]:
recipes = pd.read_csv("./data/recipes_sample.csv", sep=",", parse_dates=['submitted'])
recipes = recipes.set_index("id")
recipes

Unnamed: 0_level_0,name,minutes,contributor_id,submitted,n_steps,description,n_ingredients
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
44123,george s at the cove black bean soup,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0
67664,healthy for them yogurt popsicles,10,91970,2003-07-26,,my children and their friends ask for my homem...,
38798,i can t believe it s spinach,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0
35173,italian gut busters,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,
84797,love is in the air beef fondue sauces,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...
267661,zurie s holey rustic olive and cheddar bread,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0
386977,zwetschgenkuchen bavarian plum cake,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0
103312,zwiebelkuchen southwest german onion cake,75,161745,2004-11-03,,this is a traditional late summer early fall s...,
486161,zydeco soup,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,


In [32]:
id_steps_dict = {}
content = open("./data/steps_sample.xml","r").read()
soup = BeautifulSoup(content,'xml')
for recipe in soup.find_all('recipe'):
    recipe_id = int(recipe.find("id").get_text())
    steps_count = len(recipe.find_all("step"))
    id_steps_dict[recipe_id] = steps_count

id_steps_dict

{44123: 11,
 67664: 3,
 38798: 5,
 35173: 7,
 84797: 4,
 44045: 6,
 107229: 8,
 95926: 4,
 453467: 12,
 306168: 6,
 50662: 15,
 118843: 3,
 69190: 5,
 503475: 10,
 149593: 10,
 200148: 18,
 310570: 38,
 95534: 10,
 109818: 7,
 66932: 7,
 226001: 12,
 125195: 5,
 141939: 13,
 250883: 14,
 120297: 14,
 147477: 3,
 223349: 7,
 60938: 10,
 302399: 11,
 342620: 9,
 296983: 14,
 166089: 14,
 129581: 33,
 116741: 2,
 325714: 6,
 276594: 6,
 487173: 30,
 289671: 6,
 44050: 2,
 447429: 24,
 137701: 18,
 292568: 2,
 299989: 14,
 63346: 7,
 342619: 9,
 383120: 10,
 367987: 3,
 463219: 8,
 39172: 8,
 216068: 3,
 173730: 28,
 287778: 9,
 437637: 10,
 123115: 14,
 371549: 8,
 376813: 9,
 134085: 4,
 390230: 34,
 401605: 7,
 306590: 5,
 303944: 13,
 299968: 13,
 192542: 4,
 147563: 9,
 193719: 16,
 38852: 9,
 250232: 10,
 134787: 7,
 437219: 9,
 77380: 5,
 21357: 7,
 198343: 9,
 129919: 12,
 375376: 11,
 152534: 8,
 63131: 6,
 24760: 8,
 327979: 2,
 375362: 3,
 217296: 13,
 121107: 12,
 435816: 29,
 

In [33]:
for index, row in recipes.iterrows():
    if pd.isnull(recipes['n_steps'][index]):
        steps = id_steps_dict[index]
        recipes['n_steps'][index] = steps

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes['n_steps'][index] = steps


3.5 Проверьте, содержит ли столбец `n_steps` пропуски. Если нет, то преобразуйте его к целочисленному типу и сохраните результаты в файл `recipes_sample_with_filled_nsteps.csv`

In [34]:
recipes_without_steps = recipes[recipes["n_steps"].isna()]
recipes_without_steps.shape[0]

0

In [35]:
recipes_without_steps

Unnamed: 0_level_0,name,minutes,contributor_id,submitted,n_steps,description,n_ingredients
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1


In [36]:
recipes.dtypes

name                      object
minutes                    int64
contributor_id             int64
submitted         datetime64[ns]
n_steps                  float64
description               object
n_ingredients            float64
dtype: object

In [37]:
recipes["n_steps"] = recipes["n_steps"].astype(int)

In [38]:
recipes.dtypes

name                      object
minutes                    int64
contributor_id             int64
submitted         datetime64[ns]
n_steps                    int64
description               object
n_ingredients            float64
dtype: object

In [39]:
recipes

Unnamed: 0_level_0,name,minutes,contributor_id,submitted,n_steps,description,n_ingredients
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
44123,george s at the cove black bean soup,90,35193,2002-10-25,11,an original recipe created by chef scott meska...,18.0
67664,healthy for them yogurt popsicles,10,91970,2003-07-26,3,my children and their friends ask for my homem...,
38798,i can t believe it s spinach,30,1533,2002-08-29,5,"these were so go, it surprised even me.",8.0
35173,italian gut busters,45,22724,2002-07-27,7,my sister-in-law made these for us at a family...,
84797,love is in the air beef fondue sauces,25,4470,2004-02-23,4,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...
267661,zurie s holey rustic olive and cheddar bread,80,200862,2007-11-25,16,this is based on a french recipe but i changed...,10.0
386977,zwetschgenkuchen bavarian plum cake,240,177443,2009-08-24,22,"this is a traditional fresh plum cake, thought...",11.0
103312,zwiebelkuchen southwest german onion cake,75,161745,2004-11-03,10,this is a traditional late summer early fall s...,
486161,zydeco soup,60,227978,2012-08-29,7,this is a delicious soup that i originally fou...,


In [40]:
recipes.to_csv("./out/recipes_sample_with_filled_nsteps.csv")