# The chocolate cake project

What are the characteristics of different types of chocolate cakes : mi-cuit, fondant, moelleux, cake and cupcake ?

Here are the steps:
1. I first retrieve the information I need on one page of marmiton.org via webscraping. 
2. I create a DataFrame
3. I analyse the data, and classify the different recipes with support-vector machines (SVM).

## 1 Webscraping of marmiton.org

### 1.1 Webscraping of one recipe




We first retrieve the information from one page, to see what's going on before automating on all the recipes

<img src="final.png" width="400">

We import the libraries:

In [2]:
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
import regex as re


In [5]:
r = requests.get("https://www.marmiton.org/recettes/recette_moelleux-au-chocolat_17982.aspx")

# Convert to a beautiful soup object
soup = bs(r.content)

# Print out the HTML
contents = soup.prettify()
#print(contents)

We use the inspect tools on the webpage to find the interesting elements :

1. Name of recipe
2. Note by the users
3. Number of comments
4. Difficulty
5. Time
6. Ingredients and quantity

In [44]:
name = soup.find("h1", class_ = "SHRD__sc-10plygc-0 itJBWW").get_text()
print("Name of recipe: "+name)

note = soup.find("span",class_ = "SHRD__sc-10plygc-0 jHwZwD").get_text()
print("Note by users: "+note)

number_comments = soup.find_all("span",class_ = "SHRD__sc-10plygc-0 cAYPwA")
number_comments = [nb(string=re.compile("commentaires")) for nb in number_comments]
number_comments = str(number_comments).replace(' commentaires','').replace('\'', '').replace('[','').replace(']','')
print("Number of comments:"+number_comments)

difficulty = soup.find_all("p",class_ = "RCP__sc-1qnswg8-1 iDYkZP")
diff = []
for index,row in enumerate(difficulty):
    diff.append(row.get_text().replace("\xa0", " ")) 
print(diff)

time = soup.find_all("span",class_ = "SHRD__sc-10plygc-0 bzAHrL")
ti = []
for index,row in enumerate(time):
    ti.append(row.get_text().replace("\xa0", " ")) 
print(ti)

ingr = soup.find_all("span", {'class':['RCP__sc-8cqrvd-3 cDbUWZ', 'RCP__sc-8cqrvd-3 itCXhd']})
ingredient = []
for index,row in enumerate(ingr):
        ingredient.append(row.get_text().replace("\xa0", " ")) 
print(ingredient)

quantity = soup.find_all("span",class_ = "SHRD__sc-10plygc-0 epviYI")
quant = []
for index,row in enumerate(quantity):
        quant.append(row.get_text().replace("\xa0", " ")) 
print(quant)

Name of recipe: Moelleux au chocolat
Note by users: 4.8/5
Number of comments:356
['25 min', 'très facile', 'bon marché']
['25 min', '15 min', '-', '10 min']
['chocolat', 'beurre', 'sucre glace', 'farine', 'oeufs']
['250 g', '175 g', '125 g', '75 g', '5']


We want to create a DataFrame with this information.
We choose the following data structure :

<img src="dataframe_2.png" width="900">


In [45]:
d = {'name': name, 'note': note, 'number_comments': number_comments}
df = pd.DataFrame(data=d,index = [0])

df_quant = pd.DataFrame(quant).transpose()
df_quant.columns = ingredient

df_diff = pd.DataFrame(diff).transpose()
df_diff =df_diff.rename(columns={0: "time_prep", 1: "difficulty",2: "price"}).drop("time_prep",axis = 1)

df_ti = pd.DataFrame(ti).transpose()
df_ti =df_ti.rename(columns={0: "time_tot", 1: "time_prep",2: "time_repos",3: "time_cooking"})

frames = [df,df_diff, df_ti, df_quant]

result = pd.concat(frames,axis = 1)

display(result)


Unnamed: 0,name,note,number_comments,difficulty,price,time_tot,time_prep,time_repos,time_cooking,chocolat,beurre,sucre glace,farine,oeufs
0,Moelleux au chocolat,4.8/5,356,très facile,bon marché,25 min,15 min,-,10 min,250 g,175 g,125 g,75 g,5


### 1.2 Getting the number of result pages for a given search

Now, we need to know the number of pages for a given search. For example, if we search for "moelleux chocolat", we get 21 pages.

We want to get it automatically.

<img src="search.png" width="500">



In [49]:
u = "https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat"
links = bs(requests.get(u).content).find_all("div", class_ = "SHRD__sc-1ymbfjb-0 fNHOUq")
page = [u+"&page="+l.get_text() for l in links]
print(page)

['https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=2', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=3', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=4', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=5', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=6', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=7', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=8', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=9', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=10', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=20', 'https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat&page=21']


### 1.2 Getting the links to the recipes, from a search page

In [50]:
r = requests.get("https://www.marmiton.org/recettes/recherche.aspx?aqt=moelleux-chocolat")
soup = bs(r.content)
links = soup.find_all("a", class_ = "MRTN__sc-1gofnyi-2 gACiYG")
actual_links = ['https://www.marmiton.org'+link['href'] for link in links]
print(actual_links)

['https://www.marmiton.org/recettes/recette_moelleux-au-chocolat_17982.aspx', 'https://www.marmiton.org/recettes/recette_cookies-aux-pepites-de-chocolat-super-moelleux_57330.aspx', 'https://www.marmiton.org/recettes/recette_veritable-moelleux-au-chocolat_12825.aspx', 'https://www.marmiton.org/recettes/recette_gateau-moelleux-marbre-vanille-chocolat_55247.aspx', 'https://www.marmiton.org/recettes/recette_le-moelleux-chocolat-d-oncle-guillaume_16996.aspx', 'https://www.marmiton.org/recettes/recette_moelleux-au-chocolat-et-noisettes_529929.aspx', 'https://www.marmiton.org/recettes/recette_moelleux-au-chocolat-coeur-fondant_165276.aspx', 'https://www.marmiton.org/recettes/recette_gateau-ultra-moelleux-au-chocolat-sans-beurre_50346.aspx', 'https://www.marmiton.org/recettes/recette_moelleux-au-chocolat-sans-beurre-sans-sucre_14748.aspx', 'https://www.marmiton.org/recettes/recette_super-moelleux-au-chocolat-pour-gourmands-et-intolerants-au-gluten_17962.aspx', 'https://www.marmiton.org/recette