# Review module

**Instructions**

In order to complete this review module, we recommend you follow these instructions:

1. Complete the code cells provided to you in this notebook, but do **not** change the names of the variables. If you do that, the autograder will fail and you will not receive any points.
2. Run all the answered cells before you run the testing cells. The answers must exist before they are graded!
3. Remove from each cell the code `raise NotImplementedError()` and replace it with your implementation.

## The dataset

The [`liste-des-immeubles-proteges-au-titre-des-monuments-historiques.csv`](data/liste-des-immeubles-proteges-au-titre-des-monuments-historiques.csv) file contains information about all the historical monuments (buildings) that can be found in the territory of France and are protected by the Government. These are the columns present in the dataset:

* `ref`: The building ID
* `tico`: The building's common name
* `adrs`: The location of the monument (the address)
* `affe`: Government agency in that is in charge of the monument
* `autr`: The name of the architect that built or designed the monument
* `com` and `wcom`: The [*commune*](https://en.wikipedia.org/wiki/Communes_of_France) where the building is located
* `desc`: Description
* `dpro`: Date of protection
* `dpt_lettre`: Province
* `hist`: History of the monument
* `insee`: [INSEE](https://www.insee.fr/en/accueil) code (ID given to the monument for statistical purposes)
* `ppro`: Legal details about the building's protection status
* `reg`: Region where the monument is located
* `sclx`: Historical period to which the monument belongs
* `stat`: An additional descriptor with information about the owner of the building

In [1]:
import pandas as pd
monuments = pd.read_csv("data/liste-des-immeubles-proteges-au-titre-des-monuments-historiques.csv", sep=";")
monuments.head()

Unnamed: 0,tico,ppro,dpro,stat,desc,hist,autr,adrs,reg,dpt_lettre,...,wadrs,wcom,code_departement,ploc,dmaj,ref,insee,contact,sclx,coordonnees_ban
0,Chapelle du cimetière,Chapelle du cimetière : classement par arrêté ...,1913/12/30 : classé MH,propriété de la commune,,"Romane pour la plus grande partie, la chapelle...",,dans le cimetière,Occitanie,Aude,...,Dans le cimetière,Capendu,11,Anciennement région de : Languedoc-Roussillon,2019-05-15,PA00102580,11068,mediatheque.patrimoine@culture.gouv.fr,12e s.;14e s.;16e s.,
1,Chapelle Notre-Dame-de-Santé,Chapelle Notre-Dame-de-Santé : classement par ...,1932/12/29 : classé MH,propriété de la commune,,La chapelle est contigüe à l'hôpital avec lequ...,,,Occitanie,Aude,...,,Carcassonne,11,Anciennement région de : Languedoc-Roussillon,2019-05-15,PA00102585,11069,mediatheque.patrimoine@culture.gouv.fr,15e s.;16e s.,"43.206303,2.344538"
2,Maison,Ensemble de l'escalier avec rampe en fer forgé...,1948/04/13 : inscrit MH,propriété privée,,Maison probablement reconstruite ou embellie e...,,Pinel (rue) 5,Occitanie,Aude,...,5 rue Pinel,Carcassonne,11,Anciennement région de : Languedoc-Roussillon,2019-05-15,PA00102604,11069,mediatheque.patrimoine@culture.gouv.fr,17e s.,"43.212951,2.352808"
3,Eglise paroissiale Saint-Loup,Eglise paroissiale : inscription par arrêté du...,1948/04/13 : inscrit MH,propriété de la commune,,L'église figure dans un acte de 1304 parmi les...,,,Occitanie,Aude,...,,Clermont-sur-Lauquet,11,Anciennement région de : Languedoc-Roussillon,2019-11-05,PA00102655,11094,mediatheque.patrimoine@culture.gouv.fr,11e s.,"43.047337,2.419943"
4,Eglise Saint-Martin,Eglise : classement par arrêté du 13 juin 1913,1913/06/13 : classé MH,propriété de la commune,,"Eglise romane à trois absides et trois nefs, s...",,,Occitanie,Aude,...,,Escales,11,Anciennement région de : Languedoc-Roussillon,2019-11-05,PA00102676,11126,mediatheque.patrimoine@culture.gouv.fr,12e s.,"43.226967,2.70928"


### Exercise 1 (3 points)

Create a new column in the dataset that contains only the year in which the monument was protected. Call it `year_protection`.

**Hint:** If you choose to use the [`pd.to_datetime()`](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) function to convert from `str` to `datetime`, make sure to use the `errors="coerce"` argument, since a few rows have corrupted data (like references to the year 20115). This argument replaces those values with `NaT`s ("Not a Time"), which are the way `pandas` represents null values in Series of type `datetime`.

In [5]:
# YOUR CODE HERE
monuments['dpro'] = pd.to_datetime(monuments['dpro'], errors='coerce')
monuments['year_protection'] = monuments['dpro'].dt.year

### Exercise 2 (2 points)

Using `.apply()` and/or ordinary string methods, create a new column called `castle` in the `monuments` DataFrame in which the values are `True` if the monument is a castle and `False` if it isn't, i.e., use the `tico` column and find whether the values contain the French word for castle. "Castle" in French is *château*.

**Hint:** There are 6,034 castles in this dataset, and 39,650 buildings that are not castles. To compute these numbers, we accounted for variation due to uppercase and lowercase differences and the common typo *chateau* (without the circumflex). Your results should correspond with this.

In [7]:
# YOUR CODE HERE
monuments['castle']=monuments['tico'].apply(lambda x:'château' in x.lower().replace('chateau','château'))

### Exercise 3 (1 point)

The `tico` column has many duplicates. For instance, *Maison*, which means "House" in French, is repeated many times because there are many monuments that are simply known as "houses". Create a **list** (*not* a new column in the `monuments` DataFrame) called `tico_once`, which exclusively lists the elements of the `tico` column that appear only *once*. This means that a mere `.drop_duplicates()` won't be valid here, because it would still include words such as "maison" or "château", that show up many times.

**Hint:** You should: Make all words lowercase, count how many times each row appears, remove those that appear more than once, and assign the remaining to the `tico_once` list. [Here](https://stackoverflow.com/a/28272238/6945498) is how to filter a Series in case you need to.

In [8]:
# YOUR CODE HERE
monuments['tico'] = monuments['tico'].str.lower()
counts = monuments['tico'].value_counts()
tico_once = counts[counts==1].index.tolist()

## Testing cells

Run the below cells to check your answers. Make sure you run your solution cells first before running the cells below, otherwise you will get an error when checking your answers.

In [9]:
# Ex 1
assert "year_protection" in monuments.columns, "Ex. 1 - You have to create a column called year_protection in the monuments DataFrame!"
print("Exercise 1 passed the preliminary sanity check.")

Exercise 1 passed the preliminary sanity check.


In [10]:
# Ex 2
assert "castle" in monuments.columns, "Ex. 2 - You have to create a column called castle in the monuments DataFrame!"
print("Exercise 2 passed the preliminary sanity check.")

Exercise 2 passed the preliminary sanity check.


In [11]:
# Ex 3
assert "tico_once" in globals(), "Ex. 3 - You need to create the tico_once variable!"
assert type(tico_once) == type([1,2,3]), "Ex. 3 - The tico_once variable must be of type list!"
assert len(tico_once) == 23874, "Ex. 3 - The tico_once list must have exactly 23874 elements. Are you standardizing the case? No need to correct typos or replace special characters."
print("Exercise 3 passed the preliminary sanity check.")

Exercise 3 passed the preliminary sanity check.


## Attribution

"Immeubles protégés au titre des Monuments Historiques", July 23, 2020, 
Ministère de la Culture, [ETALAB 2.0 license](https://www.etalab.gouv.fr/wp-content/uploads/2017/04/ETALAB-Licence-Ouverte-v2.0.pdf), https://www.data.gouv.fr/fr/datasets/immeubles-proteges-au-titre-des-monuments-historiques-2/