# **Where to hang out in Paris ?**

This notebook will provide the main results we obtained through our work. It allows anyone to use the functions without having to use the Interface.
It is divided in 3 subsequence:

I- Maps

II-Statistics

III. Scrapping (Optional)


## **I. Maps**

This section will allow you to create the Maps we create through the interface. Each cell is dedicated to a function and an example is always provided.


 The functions defined below are very similar to those defined in the mapping files of each folder. The mapping.py files are the functions we use in the interface and are programmed to open directly a window in your webbrowser with the map. The only thing we change in the following functions is that we deleted the order to open a new window with the map created. If you want to open the map, you need to go to Output/Maps and to open them in a webbrowser. JupyterLab does it by itself but some code editor as VSCODE may require an additionnal extension.

 We first import the main modules we are going to use:

In [None]:
import folium
import pandas as pd 
from folium.plugins import MarkerCluster
from datetime import datetime

### Cinema

The following sub-section gives you the code to plot the maps of all the movie displayed on a day. This uses the database DataCinema.csv created in the scrapping section. Pay attention that not all the cinemas of Paris are represented on the maps but lots of them are (additional details are provided in the code of the scrapping section). 

All the more, the database hasn't been uptated since 28 th december 2023, so the movie show on the maps are those streamed this day. If you want to have those of the current day you need to update the database through the codes in the scrapping section.


In [None]:

def HourConversion(string):                  
    return datetime.strptime(string, '%H:%M') 

def MovieMapping(data,MinHour:'HHhMM',MaxHour:'HHhMM'):     
    HeureDebut = datetime.strptime(MinHour, "%Hh%M")
    HeureFin = datetime.strptime(MaxHour, "%Hh%M")
    
    
    data['time'] = data['heure'].apply(lambda x: HourConversion(x))
    AdjustedData = data[data['time']>= HeureDebut]
    AdjustedData = AdjustedData[AdjustedData['time'] <= HeureFin]
    
    MovieMap = folium.Map(location=[48.8566, 2.3522], zoom_start=12)
    GeoVisited = {}

    for index, row in AdjustedData.iterrows():
        lat = row['geo'].split(',')[0]
        lon = row['geo'].split(',')[1]
    
        content = "<h4 style='color:black;'>" + row['nom'] + "</h4>" \
                "<p style='font-size:16px;'>" + row['etablissement'] + "</p>"\
                "<p style='font-size:16px;'>" + row['heure'] + "</p>"
        
        if (lat,lon) not in GeoVisited.keys():
            GeoVisited[(lat,lon)] = MarkerCluster().add_to(MovieMap)
            folium.Marker(location=(lat,lon), popup = content, max_width=500).add_to(GeoVisited[(lat,lon)])
        else:
            folium.Marker(location=(lat,lon), popup = content, max_width=500).add_to(GeoVisited[(lat,lon)])
    MovieMap.save("/home/onyxia/work/Maps_cultural_life_Paris/Outputs/Maps/MovieMap.html")


#### *Example*
We will use the previous function to create a map showing all the movies streamed in Paris on 28th December 2023 whose cinema sessions begins between 5:30 PM (17h30) and 10:00 PM (22h00). 

To see the final Map, open the file name MovieMap.html in Outputs/Maps though your webbrowser

In [None]:
program = pd.read_csv('/home/onyxia/work/Maps_cultural_life_Paris/Outputs/DataSets/DataCinema.csv')
MovieMapping(program,'17h30','22h00') 

NameError: name 'pd' is not defined

### Theater

The following function creates the map of the theater play played on a given day. You can choose the day as an argument of the function. Pay attention that the database had been lastly updated on 28th December of 2023. If you use the function a longtime after this date, don't hesitate to run webscrapping codes again. 

In [None]:

def TheaterMap(data:'Pandas DataFrame',date:'YYYY-MM-DD'):

    user_date = datetime.strptime(date, '%Y-%m-%d')
    data['date début'] = pd.to_datetime(data['date début'])
    data['date fin'] = pd.to_datetime(data['date fin'])
    filtered_data = data[(data['date début'] <= user_date) & (user_date <= data['date fin'])]

    paris_coordinates = [48.8566, 2.3522]
    my_map = folium.Map(location=paris_coordinates, zoom_start=12)

    for index, row in filtered_data.iterrows():
        establishment_name = row['etablissement']
        address = row['adresse']
        show_name = row['nom']
        average_price = row['prix moyen']

        coordinates = [float(coord.strip('()')) for coord in row['Coordonnees'].split(',')]
        popup_text = f"<b>{establishment_name}</b><br>Adresse: {address}<br>Pièce: {show_name}<br>Prix moyen: {average_price} €"
        folium.Marker(location=coordinates, popup=popup_text).add_to(my_map)
    
    my_map.save("/home/onyxia/work/Maps_cultural_life_Paris/Outputs/Maps/Theatermap.html")


#### *Example*
We create a Map with all the theaters plays play on 12th January of 2024. If you want to create a map for an other date pay attention to the format of the date that has to be 'YYYY-MM-DD'.
To see the map, open TheaterMap.html in the Outputs/Maps section.

In [None]:
DataTheater = pd.read_csv('/home/onyxia/work/Maps_cultural_life_Paris/Outputs/DataSets/DataTheatre_base_finale.csv', sep=';')
TheaterMap(DataTheater,'2024-01-12' )

### Concerts

Hier are provided the functions used to make a map of concerts play on a given day in Paris.

In [None]:
def ConcertMap():
    pass

#### *Example*

We plot a map with all the concert played on...

In [None]:
#Waiting for le code de Biviano

## **II. Statistics** 

In this section we review the main statistics we realised with the databases we produced in the first section.
We will go through various topics such as finding the most relevant place for you to hang out in Paris, according to your tastes.


### Cinema
This section will contain statistics made from the databases we created or found. For the cinemas I will use the Open database from the City of Paris about cinemas of the town. The database issued by webscrapping only contains data about one day so it's hard to make statistics from it. 

The original file is located in Resources/Data and is named ListeCinema.csv.
We first read the file trough pandas and import the necessary modules :



In [6]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime


In [7]:
DataCinemaParis = pd.read_csv('/home/onyxia/work/Maps_cultural_life_Paris/Ressources/Data/ListeCinema.csv', sep = ';')
DataCinemaParis = DataCinemaParis[DataCinemaParis['dep'] == 75]
DataCinemaParis.head()

FileNotFoundError: [Errno 2] No such file or directory: '/home/onyxia/work/Maps_cultural_life_Paris/Ressources/Data/ListeCinema.csv'

#### General statistics
This section deals with general statistics made on cinemas in Paris.

In [None]:
DataCinemaParis.shape

In [None]:
print(DataCinemaParis['ecrans'].mean())
print(DataCinemaParis['fauteuils'].mean())

In [None]:
mean_per_disctrict = DataCinemaParis.groupby('commune')['fauteuils'].mean()
print(mean_per_disctrict)


#### Theater

In [None]:
df = pd.read_csv('/home/onyxia/work/Maps_cultural_life_Paris/Outputs/DataSets/DataTheatre_base_finale.csv', sep=';')

We are looking for the number of plays per district

This first step helps us understand the cultural activity you will find in each district of Paris


In [None]:
result_df = df.groupby('commune')['etablissement'].count().reset_index()
result_df = result_df.rename(columns={'etablissement': 'nombre_de_pieces'})
print(result_df)

In [None]:
result_df['arrondissement'] = result_df['commune'].str.extract('(\d+)', expand=False).astype(int)
df_classe = result_df.sort_values(by='arrondissement')
print(df_classe)

In [None]:
fig, ax = plt.subplots(figsize=(20, max(6, len(result_df) * 0.3)))
ax.bar(result_df['commune'], result_df['nombre_de_pieces'], color='blue')
ax.set_title('Nombre de pièces par quartier')
plt.tight_layout()
plt.show()

We are looking for the number of plays per theatre.

In this second step, we extract the number of different plays each theater will propose from now on until the end of the season (june) 

In [None]:
df_pieces_par_etablissement = df.groupby('etablissement')['nom'].count().reset_index()
df_pieces_par_etablissement = df_pieces_par_etablissement.rename(columns={'nom': 'nombre_de_pieces'})
print(df_pieces_par_etablissement)

We are looking for the mean price per quarter

In [None]:

df['prix moyen'] = df['prix moyen'].astype(str)
df['prix moyen'] = pd.to_numeric(df['prix moyen'].str.extract('(\d+)')[0], errors='coerce')
df['prix moyen'] = df['prix moyen'].round(1)
df_prix_moyen = df.groupby('commune')['prix moyen'].mean().reset_index()
print(df_prix_moyen)

In [None]:
df_prix_moyen['arrondissement'] = df_prix_moyen['commune'].str.extract('(\d+)', expand=False).astype(int)
df_ordonne = df_prix_moyen.sort_values(by='arrondissement')
print(df_ordonne)

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

We will have a closer look at Paris's district to test if their level of wealth has a significant impact on the price of theatre's tickets

At first sight, it seems logicial that the more wealthy a quarter is, the higher prices will be. But how to measure it effectively ? In fact, it is difficult to measure quarter's wealth. We decided to use a simple indicator: the price of a square meter in every quarter assuming that the higher it is, the higher the purchasing power of its inhabitant will be.

First step: we scraped a website with the 2023-data and saved them in the 'Prixm2' csv file

In [None]:
import bs4
import lxml
import pandas as pd
import urllib
from bs4 import BeautifulSoup
from urllib import request
import numpy as np

In [None]:
url = 'https://www.journaldunet.com/patrimoine/prix-immobilier/paris/ville-75056'
request_text = request.urlopen(url).read()
page = bs4.BeautifulSoup(request_text, 'html.parser')

In [None]:
rows = page.find_all('tr')
informations = []

for row in rows:
    link = row.find('a')
    columns = row.find_all('td')
    if link and len(columns) > 1:
        arrondissement = link.text.strip()
        prix = columns[1].text.strip()
        informations.append({'arrondissement': arrondissement, 'prix': prix})

df2 = pd.DataFrame(informations)

df2['prix'] = df2['prix'].str.replace('\D', '', regex=True).astype(int) #removes all the non-numerical (but keeps the 2 of the m2) 
df2['prix'] = df2['prix'].astype(str).str[:-1].astype(int) #removes the last figure of our number (here the 2 from 'm2')

df2.to_csv('Prixm2.csv', index = False)

In [None]:
# Regression of the price of the ticket on the price of the square meter


In [None]:
X=df2['prix']
y = df_ordonne['prix moyen']
X = X.values.reshape(-1, 1)
model1 = LinearRegression()
model1.fit(X,y)
y_pred = model1.predict(X)
plt.scatter(X, y)
plt.plot(X, y_pred, color='red', linewidth=1.5)
plt.xlabel('Prix du m2 dans l arrondissement')
plt.ylabel('Prix du billet')
plt.show()

In [None]:
print("Pente :", model1.coef_)
print("Fixe :", model1.intercept_)

Some conclusions:

1- our initial intuition seems to be right: the higher the square meter costs, the higher the ticket price is. 

2- our 

In [None]:
A = df_classe['nombre_de_pieces']
b = df_ordonne['prix moyen']
A= A.values.reshape(-1, 1)
model2 = LinearRegression()
model2.fit(A,b)
b_pred = model2.predict(A)
plt.scatter(A, b)
plt.plot(A, b_pred, color='yellow', linewidth=1.5)
plt.xlabel('Nombre de pieces dans le quartier')
plt.ylabel('Prix du billet')
plt.show()

In [None]:
print("Pente :", model2.coef_)
print("Fixe :", model2.intercept_)

### Concerts

## **III. Scrapping** 

We collected most of our Data using webscrapping. We used BeautifulSoup4 to webscrap the most relevant sites for our project. An error may occur while running the algorithm or some cell may take lots of time to run due to long API request. We don't recommend you to run it, except if you want to try to update the Databases that are stored in the Output sections.


