# Projet Final Apache Spark

**Nom Etudiant :**  NGOSSINGA

**Prenom Etudiant:**  Luc Esdras

**Classe :**  Ingénieur II


## Description
Ce projet consiste à utiliser Apache Spark pour faire l'analyse et le traitement des données de **[San Francisco Fire Department Calls ](https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3)** afin de fournir quelques KPI (*Key Performance Indicator*). Le **SF Fire Dataset** comprend les réponses aux appels de toutes les unités d'incendie. Chaque enregistrement comprend le numéro d'appel, le numéro d'incident, l'adresse, l'identifiant de l'unité, le type d'appel et la disposition. Tous les intervalles de temps pertinents sont également inclus. Étant donné que ce Dataset est basé sur les réponses et que la plupart des appels impliquent plusieurs unités, ainsi il existe plusieurs enregistrements pour chaque numéro d'appel. Les adresses sont associées à un numéro de bloc, à une intersection ou à une boîte d'appel, et non à une adresse spécifique.

**Plus de details sur la description des données cliquer sur ce [lien](https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3)**

## Travail à faire.
L'objectif de ce projet est de comprendre le **SF Fire Dataset** afin de bien répondre aux questions en utilisant les codes Spark/Scala adéquates.

- Créer un repos git (public) et partager le repos avec mon mail (limahin10@gmail.com)
- Ecrire un code lisible et bien indenté 
- N'oublier pas de mettre en commentaire la justification de vos réponses sur les cellules Markdown. 


## Note:
- Le projet est personnel, c'est-à-dire chaque notebook ne concerne qu'un seul étudiant. 
- Deadline : **Lundi 28 Fevrier 2023 à 23h 59** (Aucune de dérogation ne sera acceptée)

### Chargement des données

Importation des packages Spark

In [None]:
import org.apache.spark.sql.types._ 
import org.apache.spark.sql.functions._ 
import spark.implicits._

Nous allons jeter un coup d'oeil sur la structure des données avant de définir un schéma

In [None]:
!head -1 "datasets/sf-fire/sf-fire-calls.csv"

Vu que la taille de ces données est énormes, inferer le schema pour un très grande volumes de données s'avère un peu couteux. Nous allons ainsi définir un schema pour le Dataset.

In [None]:
val fireSchema = StructType(Array(
        StructField("CallNumber", IntegerType, true),
        StructField("UnitID", StringType, true),
        StructField("IncidentNumber", IntegerType, true),
        StructField("CallType", StringType, true),                  
        StructField("CallDate", StringType, true),      
        StructField("WatchDate", StringType, true),
        StructField("ReceivedDtTm", StringType, true),
        StructField("EntryDtTm", StringType, true),
        StructField("DispachDtTm", StringType, true),
        StructField("ResponseDtTm", StringType, true),
        StructField("OnSceneDtTm", StringType, true),
        StructField("TransportDtTm", StringType, true),
        StructField("HopitalDtTm", StringType, true),
        StructField("CallFinalDisposition", StringType, true),
        StructField("AvailableDtTm", StringType, true),
        StructField("Address", StringType, true),       
        StructField("City", StringType, true),       
        StructField("Zipcode", IntegerType, true),       
        StructField("Battalion", StringType, true),                 
        StructField("StationArea", StringType, true),       
        StructField("Box", StringType, true),       
        StructField("OriginalPriority", StringType, true),       
        StructField("Priority", StringType, true),       
        StructField("FinalPriority", IntegerType, true),       
        StructField("ALSUnit", BooleanType, true),       
        StructField("CallTypeGroup", StringType, true),
        StructField("NumAlarms", IntegerType, true),
        StructField("UnitType", StringType, true),
        StructField("UnitSequenceInCallDispatch", IntegerType, true),
        StructField("FirePreventionDistrict", StringType, true),
        StructField("SupervisorDistrict", StringType, true),
        StructField("Neighborhood", StringType, true),
        StructField("Location", StringType, true),
        StructField("RowID", StringType, true),
        StructField("Delay", FloatType, true)))

/*
------------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+--------------------+---------+--------+--------------------------+----------------------+------------------+-------------------+---------------+--------------------+---------------------+-------------------+-------------------+-------------------+
|CallNumber|UnitID|IncidentNumber|            CallType|        ReceivedDtTm|           EntryDtTm|         DispachDtTm|        ResponseDtTm|         OnSceneDtTm|       TransportDtTm|         HopitalDtTm|CallFinalDisposition|             Address|         City|Zipcode|Battalion|StationArea| Box|OriginalPriority|Priority|FinalPriority|ALSUnit|       CallTypeGroup|NumAlarms|UnitType|UnitSequenceInCallDispatch|FirePreventionDistrict|SupervisorDistrict|       Neighborhood|       Location|               RowID|ResponseDelayedinMins|       IncidentDate|        OnWatchDate|      AvailableDtTS|
+----------+------+--------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+--------------------+---------+--------+--------------------------+----------------------+------------------+-------------------+---------------+--------------------+---------------------+-------------------+-------------------+-------------------+
| 221210313|   E36|      22054955|        Outside Fire|05/01/2022 02:58:...|05/01/2022 02:59:...|05/01/2022 02:59:...|05/01/2022 03:01:...|05/01/2022 03:02:...|                null|                null|                Fire|   GOUGH ST/GROVE ST|San Francisco|  94102|      B02|         36|3265|               3|       3|            3|   true|                Fire|        1|  ENGINE|                         1|                     2|                 5|       Hayes Valley|  221210313-E36|POINT (-122.42316...|                  9.0|2022-05-01 00:00:00|2022-04-30 00:00:00|2022-05-01 03:05:00|
| 220190150|   E29|      22008871|              Alarms|01/19/2022 01:42:...|01/19/2022 01:44:...|01/19/2022 01:44:...|01/19/2022 01:46:...|01/19/2022 01:49:...|                null|                null|                Fire|100 Block of MISS...|San Francisco|  94107|      B03|         29|2431|               3|       3|            3|   true|               Alarm|        1|  ENGINE|                         1|                     3|                10|       Potrero Hill|  220190150-E29|POINT (-122.39469...|                 26.0|2022-01-19 00:00:00|2022-01-18 00:00:00|2022-01-19 02:35:26|
| 211233271|   T07|      21053032|              Alarms|05/03/2021 09:28:...|05/03/2021 09:28:...|05/03/2021 09:28:...|05/03/2021 09:29:...|05/03/2021 09:32:...|                null|                null|                Fire|  0 Block of HOFF ST|San Francisco|  94110|      B02|         07|5236|               A|       3|            3|  false|               Alarm|        1|   TRUCK|                         2|                     2|                 9|            Mission|  211233271-T07|POINT (-122.42057...|                 20.0|2021-05-03 00:00:00|2021-05-03 00:00:00|2021-05-03 21:38:09|
| 212933533|   B02|      21127914|              Alarms|10/20/2021 10:08:...|10/20/2021 10:09:...|10/20/2021 10:10:...|10/20/2021 10:11:...|                null|                null|                null|                Fire|200 Block of JONE...|San Francisco|  94102|      B03|         03|1456|               3|       3|            3|  false|               Alarm|        1|   CHIEF|                         3|                     3|                 6|         Tenderloin|  212933533-B02|POINT (-122.41243...|                 36.0|2021-10-20 00:00:00|2021-10-20 00:00:00|2021-10-20 22:25:52|
| 221202543|   E41|      22054815|              Alarms|04/30/2022 06:35:...|04/30/2022 06:37:...|04/30/2022 06:37:...|04/30/2022 06:38:...|                null|                null|                null|                Fire|1400 Block of FIL...|San Francisco|  94109|      B04|         16|3146|               3|       3|            3|  false|               Alarm|        1|  ENGINE|                         4|                     4|                 2|       Russian Hill|  221202543-E41|POINT (-122.42333...|                 32.0|2022-04-30 00:00:00|2022-04-30 00:00:00|2022-04-30 18:40:08|
| 211232439|   B01|      21052945|              Alarms|05/03/2021 04:57:...|05/03/2021 04:58:...|05/03/2021 04:58:...|05/03/2021 05:00:...|                null|                null|                null|                Fire|500 Block of JONE...|San Francisco|  94102|      B01|         03|1462|               3|       3|            3|  false|               Alarm|        1|   CHIEF|                         2|                     1|                 6|         Tenderloin|  211232439-B01|POINT (-122.41299...|                 36.0|2021-05-03 00:00:00|2021-05-03 00:00:00|2021-05-03 17:05:20|
| 211942517|   T03|      21083057|              Alarms|07/13/2021 04:50:...|07/13/2021 04:51:...|07/13/2021 04:51:...|                null|                null|                null|                null|                Fire|900 Block of VAN ...|San Francisco|  94109|      B04|         03|3162|               3|       3|            3|  false|               Alarm|        1|   TRUCK|                         3|                     2|                 6|         Tenderloin|  211942517-T03|POINT (-122.42090...|                 36.0|2021-07-13 00:00:00|2021-07-13 00:00:00|2021-07-13 16:54:45|
| 212932758|   B01|      21127810|              Alarms|10/20/2021 05:46:...|10/20/2021 05:47:...|10/20/2021 05:48:...|10/20/2021 05:49:...|10/20/2021 05:53:...|                null|                null|                Fire| 0 Block of BEACH ST|San Francisco|  94133|      B01|         28|0939|               3|       3|            3|  false|               Alarm|        1|   CHIEF|                         2|                     1|                 3|        North Beach|  212932758-B01|POINT (-122.40987...|                 23.0|2021-10-20 00:00:00|2021-10-20 00:00:00|2021-10-20 18:00:04|
| 221201816|   T03|      22054719|      Structure Fire|04/30/2022 02:27:...|04/30/2022 02:28:...|04/30/2022 02:29:...|04/30/2022 02:31:...|04/30/2022 02:31:...|                null|                null|                Fire|   MISSION ST/9TH ST|San Francisco|  94103|      B02|         36|2336|               3|       3|            3|  false|               Alarm|        1|   TRUCK|                         3|                     2|                 6|    South of Market|  221201816-T03|POINT (-122.41471...|                 34.0|2022-04-30 00:00:00|2022-04-30 00:00:00|2022-04-30 14:46:15|
| 211941580| SCRT4|      21082970|    Medical Incident|07/13/2021 12:23:...|07/13/2021 12:28:...|07/13/2021 12:47:...|07/13/2021 12:47:...|                null|                null|                null|                SFPD|100 Block of VICE...|San Francisco|  94127|      B08|         39|8612|               1|       1|            2|  false|Non Life-threatening|        1| SUPPORT|                         1|                     8|                 7| West of Twin Peaks|211941580-SCRT4|POINT (-122.46750...|                 41.0|2021-07-13 00:00:00|2021-07-13 00:00:00|2021-07-13 12:48:46|
| 220181779|    50|      22008631|               Other|01/18/2022 01:38:...|01/18/2022 01:38:...|01/18/2022 01:59:...|01/18/2022 01:59:...|01/18/2022 01:59:...|                null|                null|                Fire|    17-TH DE HARO ST|San Francisco|  94103|      B03|         29|2355|               2|       2|            2|   true|Non Life-threatening|        1|   MEDIC|                         1|                   3.0|                10|        Mission Bay|   220181779-50|POINT (-122.40113...|                  4.0|2022-01-18 00:00:00|2022-01-18 00:00:00|2022-01-18 15:55:14|
| 220111608|   E06|      22005327|              Alarms|01/11/2022 01:05:...|01/11/2022 01:07:...|01/11/2022 01:07:...|01/11/2022 01:08:...|01/11/2022 01:09:...|                null|                null|                Fire|0 Block of SANCHE...|San Francisco|  94114|      B05|         06|5131|               3|       3|            3|   true|               Alarm|        1|  ENGINE|                         1|                     2|                 8|Castro/Upper Market|  220111608-E06|POINT (-122.43120...|                  5.0|2022-01-11 00:00:00|2022-01-11 00:00:00|2022-01-11 13:16:05|
| 220111597| AM110|      22005326|    Medical Incident|01/11/2022 12:59:...|01/11/2022 01:02:...|01/11/2022 01:03:...|01/11/2022 01:03:...|01/11/2022 01:11:...|01/11/2022 02:04:...|01/11/2022 02:52:...|    Code 2 Transport|100 Block of ADDI...|San Francisco|  94131|      B06|         26|8122|               2|       2|            2|  false|Potentially Life-...|        1| PRIVATE|                         1|                     6|                 8|          Glen Park|220111597-AM110|POINT (-122.43239...|                 10.0|2022-01-11 00:00:00|2022-01-11 00:00:00|2022-01-11 15:36:28|
| 220111595|   E07|      22005325|        Outside Fire|01/11/2022 01:01:...|01/11/2022 01:01:...|01/11/2022 01:02:...|01/11/2022 01:03:...|01/11/2022 01:05:...|                null|                null|                Fire|  FLORIDA ST/15TH ST|San Francisco|  94103|      B02|         29|5221|               A|       3|            3|   true|                Fire|        1|  ENGINE|                         1|                     2|                 9|            Mission|  220111595-E07|POINT (-122.41158...|                 20.0|2022-01-11 00:00:00|2022-01-11 00:00:00|2022-01-11 13:07:41|
| 220181524|   E08|      22008605|              Alarms|01/18/2022 12:19:...|01/18/2022 12:22:...|01/18/2022 12:23:...|01/18/2022 12:23:...|01/18/2022 12:24:...|                null|                null|                Fire|600 Block of BRYA...|San Francisco|  94107|      B03|         08|2242|               3|       3|            3|   true|               Alarm|        1|  ENGINE|                         2|                   3.0|                 6|    South of Market|  220181524-E08|POINT (-122.39917...|                 34.0|2022-01-18 00:00:00|2022-01-18 00:00:00|2022-01-18 12:27:19|
| 221201435|   E41|      22054664|        Outside Fire|04/30/2022 12:32:...|04/30/2022 12:34:...|04/30/2022 12:34:...|04/30/2022 12:35:...|04/30/2022 12:37:...|                null|                null|                Fire|  LARKIN ST/GREEN ST|San Francisco|  94109|      B01|         41|1631|               3|       3|            3|  false|                Fire|        1|  ENGINE|                         1|                     1|                 3|       Russian Hill|  221201435-E41|POINT (-122.42050...|                 32.0|2022-04-30 00:00:00|2022-04-30 00:00:00|2022-04-30 12:41:42|
| 211941035|   T05|      21082896|Citizen Assist / ...|07/13/2021 10:00:...|07/13/2021 10:01:...|07/13/2021 10:05:...|07/13/2021 10:05:...|                null|                null|                null|                Fire|0 Block of FRANKL...|San Francisco|  94102|      B02|         36|3212|               3|       3|            3|  false|               Alarm|        1|   TRUCK|                         3|                     2|                 5|       Hayes Valley|  211941035-T05|POINT (-122.42074...|                  9.0|2021-07-13 00:00:00|2021-07-13 00:00:00|2021-07-13 10:10:08|
| 220181030|   B10|      22008545|              Alarms|01/18/2022 10:09:...|01/18/2022 10:11:...|01/18/2022 10:11:...|01/18/2022 10:12:...|                null|                null|                null|                Fire|700 Block of FLOR...|San Francisco|  94110|      B06|         07|5451|               3|       3|            3|  false|               Alarm|        1|   CHIEF|                         2|                     6|                 9|            Mission|  220181030-B10|POINT (-122.41082...|                 20.0|2022-01-18 00:00:00|2022-01-18 00:00:00|2022-01-18 10:21:57|
| 222091290|   T10|      22096127|              Alarms|07/28/2022 11:47:...|07/28/2022 11:49:...|07/28/2022 11:50:...|07/28/2022 11:51:...|                null|                null|                null|                Fire|300 Block of MASO...|San Francisco|  94118|      B05|         21|4462|               3|       3|            3|  false|               Alarm|        1|   TRUCK|                         3|                     5|                 5|  Lone Mountain/USF|  222091290-T10|POINT (-122.44687...|                 18.0|2022-07-28 00:00:00|2022-07-28 00:00:00|2022-07-28 11:56:01|
| 222091252| SCRT3|      22096119|               Other|07/28/2022 11:39:...|07/28/2022 11:39:...|07/28/2022 11:39:...|                null|                null|                null|                null|           Cancelled|900 Block of MARK...|San Francisco|  94103|      B01|         01|2247|               A|       1|            2|  false|                null|        1| SUPPORT|                         1|                     3|                 6|    South of Market|222091252-SCRT3|POINT (-122.40842...|                 34.0|2022-07-28 00:00:00|2022-07-28 00:00:00|2022-07-28 11:40:38|
+----------+------+--------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+--------------------+---------+--------+--------------------------+----------------------+------------------+-------------------+---------------+--------------------+---------------------+-------------------+-------------------+-------------------+
*/

/*
|CallNumber|UnitID|IncidentNumber|    CallType|  CallDate| WatchDate|        ReceivedDtTm|           EntryDtTm|         DispachDtTm|        ResponseDtTm|         OnSceneDtTm|TransportDtTm|HopitalDtTm|CallFinalDisposition|       AvailableDtTm|             Address|         City|Zipcode|Battalion|StationArea| Box|OriginalPriority|Priority|FinalPriority|ALSUnit|CallTypeGroup|NumAlarms|UnitType|UnitSequenceInCallDispatch|FirePreventionDistrict|SupervisorDistrict|Neighborhood|     Location|               RowID|ResponseDelayedinMins|
+----------+------+--------------+------------+----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+-----------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+-------------+---------+--------+--------------------------+----------------------+------------------+------------+-------------+--------------------+---------------------+
| 221210313|   E36|      22054955|Outside Fire|05/01/2022|04/30/2022|05/01/2022 02:58:...|05/01/2022 02:59:...|05/01/2022 02:59:...|05/01/2022 03:01:...|05/01/2022 03:02:...|         null|       null|                Fire|05/01/2022 03:05:...|   GOUGH ST/GROVE ST|San Francisco|  94102|      B02|         36|3265|               3|       3|            3|   true|         Fire|        1|  ENGINE|                         1|                     2|                 5|Hayes Valley|221210313-E36|POINT (-122.42316...|                  9.0|
| 220190150|   E29|      22008871|      Alarms|01/19/2022|01/18/2022|01/19/2022 01:42:...|01/19/2022 01:44:...|01/19/2022 01:44:...|01/19/2022 01:46:...|01/19/2022 01:49:...|         null|       null|                Fire|01/19/2022 02:35:...|100 Block of MISS...|San Francisco|  94107|      B03|         29|2431|               3|       3|            3|   true|        Alarm|        1|  ENGINE|                         1|                     3|                10|Potrero Hill|220190150-E29|POINT (-122.39469...|                 26.0|
| 211233271|   T07|      21053032|      Alarms|05/03/2021|05/03/2021|05/03/2021 09:28:...|05/03/2021 09:28:...|05/03/2021 09:28:...|05/03/2021 09:29:...|05/03/2021 09:32:...|         null|       null|                Fire|05/03/2021 09:38:...|  0 Block of HOFF ST|San Francisco|  94110|      B02|         07|5236|               A|       3|            3|  false|        Alarm|        1|   TRUCK|                         2|                     2|                 9|     Mission|211233271-T07|POINT (-122.42057...|                 20.0|
| 212933533|   B02|      21127914|      Alarms|10/20/2021|10/20/2021|10/20/2021 10:08:...|10/20/2021 10:09:...|10/20/2021 10:10:...|10/20/2021 10:11:...|                null|         null|       null|                Fire|10/20/2021 10:25:...|200 Block of JONE...|San Francisco|  94102|      B03|         03|1456|               3|       3|            3|  false|        Alarm|        1|   CHIEF|                         3|                     3|                 6|  Tenderloin|212933533-B02|POINT (-122.41243...|                 36.0|
| 221202543|   E41|      22054815|      Alarms|04/30/2022|04/30/2022|04/30/2022 06:35:...|04/30/2022 06:37:...|04/30/2022 06:37:...|04/30/2022 06:38:...|                null|         null|       null|                Fire|04/30/2022 06:40:...|1400 Block of FIL...|San Francisco|  94109|      B04|         16|3146|               3|       3|            3|  false|        Alarm|        1|  ENGINE|                         4|                     4|                 2|Russian Hill|221202543-E41|POINT (-122.42333...|                 32.0|
+----------+------+--------------+------------+----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+-----------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+-------------+---------+--------+--------------------------+----------------------+------------------+------------+-------------+--------------------+---------------------+*/

In [None]:
val sfFireFile = "C:/datasets/sf-fire-calls.csv"
val fireDF = spark.read.schema(fireSchema).option("header", "true").csv(sfFireFile)


Nous allons mettre en cache le Dataframe

In [None]:
fireDF.cache()

In [None]:
fireDF.count()
//Long = 2009005

In [None]:
fireDF.printSchema()
/*
scala> fireDF.printSchema()
root
 |-- CallNumber: integer (nullable = true)
 |-- UnitID: string (nullable = true)
 |-- IncidentNumber: integer (nullable = true)
 |-- CallType: string (nullable = true)
 |-- CallDate: string (nullable = true)
 |-- WatchDate: string (nullable = true)
 |-- ReceivedDtTm: string (nullable = true)
 |-- EntryDtTm: string (nullable = true)
 |-- DispachDtTm: string (nullable = true)
 |-- ResponseDtTm: string (nullable = true)
 |-- OnSceneDtTm: string (nullable = true)
 |-- TransportDtTm: string (nullable = true)
 |-- HopitalDtTm: string (nullable = true)
 |-- CallFinalDisposition: string (nullable = true)
 |-- AvailableDtTm: string (nullable = true)
 |-- Address: string (nullable = true)
 |-- City: string (nullable = true)
 |-- Zipcode: integer (nullable = true)
 |-- Battalion: string (nullable = true)
 |-- StationArea: string (nullable = true)
 |-- Box: string (nullable = true)
 |-- OriginalPriority: string (nullable = true)
 |-- Priority: string (nullable = true)
 |-- FinalPriority: integer (nullable = true)
 |-- ALSUnit: boolean (nullable = true)
 |-- CallTypeGroup: string (nullable = true)
 |-- NumAlarms: integer (nullable = true)
 |-- UnitType: string (nullable = true)
 |-- UnitSequenceInCallDispatch: integer (nullable = true)
 |-- FirePreventionDistrict: string (nullable = true)
 |-- SupervisorDistrict: string (nullable = true)
 |-- Neighborhood: string (nullable = true)
 |-- Location: string (nullable = true)
 |-- RowID: string (nullable = true)
 |-- Delay: float (nullable = true)
*/

In [None]:
fireDF.show(5)

/*
scala> fireDF.show(5)
+----------+------+--------------+------------+----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+-----------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+-------------+---------+--------+--------------------------+----------------------+------------------+------------+-------------+--------------------+-----+
|CallNumber|UnitID|IncidentNumber|    CallType|  CallDate| WatchDate|        ReceivedDtTm|           EntryDtTm|         DispachDtTm|        ResponseDtTm|         OnSceneDtTm|TransportDtTm|HopitalDtTm|CallFinalDisposition|       AvailableDtTm|             Address|         City|Zipcode|Battalion|StationArea| Box|OriginalPriority|Priority|FinalPriority|ALSUnit|CallTypeGroup|NumAlarms|UnitType|UnitSequenceInCallDispatch|FirePreventionDistrict|SupervisorDistrict|Neighborhood|     Location|               RowID|Delay|
+----------+------+--------------+------------+----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+-----------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+-------------+---------+--------+--------------------------+----------------------+------------------+------------+-------------+--------------------+-----+
| 221210313|   E36|      22054955|Outside Fire|05/01/2022|04/30/2022|05/01/2022 02:58:...|05/01/2022 02:59:...|05/01/2022 02:59:...|05/01/2022 03:01:...|05/01/2022 03:02:...|         null|       null|                Fire|05/01/2022 03:05:...|   GOUGH ST/GROVE ST|San Francisco|  94102|      B02|         36|3265|               3|       3|            3|   true|         Fire|        1|  ENGINE|                         1|                     2|                 5|Hayes Valley|221210313-E36|POINT (-122.42316...|  9.0|
| 220190150|   E29|      22008871|      Alarms|01/19/2022|01/18/2022|01/19/2022 01:42:...|01/19/2022 01:44:...|01/19/2022 01:44:...|01/19/2022 01:46:...|01/19/2022 01:49:...|         null|       null|                Fire|01/19/2022 02:35:...|100 Block of MISS...|San Francisco|  94107|      B03|         29|2431|               3|       3|            3|   true|        Alarm|        1|  ENGINE|                         1|                     3|                10|Potrero Hill|220190150-E29|POINT (-122.39469...| 26.0|
| 211233271|   T07|      21053032|      Alarms|05/03/2021|05/03/2021|05/03/2021 09:28:...|05/03/2021 09:28:...|05/03/2021 09:28:...|05/03/2021 09:29:...|05/03/2021 09:32:...|         null|       null|                Fire|05/03/2021 09:38:...|  0 Block of HOFF ST|San Francisco|  94110|      B02|         07|5236|               A|       3|            3|  false|        Alarm|        1|   TRUCK|                         2|                     2|                 9|     Mission|211233271-T07|POINT (-122.42057...| 20.0|
| 212933533|   B02|      21127914|      Alarms|10/20/2021|10/20/2021|10/20/2021 10:08:...|10/20/2021 10:09:...|10/20/2021 10:10:...|10/20/2021 10:11:...|                null|         null|       null|                Fire|10/20/2021 10:25:...|200 Block of JONE...|San Francisco|  94102|      B03|         03|1456|               3|       3|            3|  false|        Alarm|        1|   CHIEF|                         3|                     3|                 6|  Tenderloin|212933533-B02|POINT (-122.41243...| 36.0|
| 221202543|   E41|      22054815|      Alarms|04/30/2022|04/30/2022|04/30/2022 06:35:...|04/30/2022 06:37:...|04/30/2022 06:37:...|04/30/2022 06:38:...|                null|         null|       null|                Fire|04/30/2022 06:40:...|1400 Block of FIL...|San Francisco|  94109|      B04|         16|3146|               3|       3|            3|  false|        Alarm|        1|  ENGINE|                         4|                     4|                 2|Russian Hill|221202543-E41|POINT (-122.42333...| 32.0|
+----------+------+--------------+------------+----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+-----------+--------------------+--------------------+--------------------+-------------+-------+---------+-----------+----+----------------+--------+-------------+-------+-------------+---------+--------+--------------------------+----------------------+------------------+------------+-------------+--------------------+-----+
only showing top 5 rows*/

Filtrage des d'appels de type "Medical Incident"

In [None]:
val fewFireDF = fireDF.select("IncidentNumber", "AvailableDtTm", "CallType").where($"CallType" =!= "Medical Incident")

fewFireDF.show(5, false)
/*
+--------------+----------------------+------------+
|IncidentNumber|AvailableDtTm         |CallType    |
+--------------+----------------------+------------+
|22054955      |05/01/2022 03:05:00 AM|Outside Fire|
|22008871      |01/19/2022 02:35:26 AM|Alarms      |
|21053032      |05/03/2021 09:38:09 PM|Alarms      |
|21127914      |10/20/2021 10:25:52 PM|Alarms      |
|22054815      |04/30/2022 06:40:08 PM|Alarms      |
+--------------+----------------------+------------+
only showing top 5 rows
*/

### Question 1
**Combien de types d'appels distincts ont été passés ?**  
Pour être sûr, il ne faut pas compter les valeurs «nulles» dans la colonne.

In [None]:
// Reponse 1
//countDistinct() ignore les valeurs null
fireDF
    .select(countDistinct("CallType"))
    .show()



### Question 2

**Quels types d'appels différents ont été passés au service d'incendie?**

In [None]:
// Reponse 2
fireDF
    .select("CallType")
    .where("CallType is not null")
    .distinct.show(false)

### Question 3

**Trouver toutes les réponses ou les délais sont supérieurs à 5 minutes?**

*Indication
1. Renommer la colonne Delay -> ReponseDelayedinMins
2. Retourner un nouveau DataFrame
3. Afficher tous les appels où le temps de réponse à un site d'incendie a eu lieu après un retard de plus de 5 minutes

In [None]:
// Reponse 3
val newFireDF = fireDF
    .withColumnRenamed("Delay", "ResponseDelayedinMins")

val dfWithDelayOver5Min = newFireDF
    .filter($"ResponseDelayedinMins" > 5)
    
dfWithDelayOver5Min.show()


### Transformations des dates  
Maintenant nous allons d'abord:  
1. Transformer les dates de type String en Spark Timestamp afin que nous puissions effectuer des requêtes basées sur la date plus tard    
2. Retourner le Dataframe transformée  
3. Mettre en cache le nouveau DataFrame  

In [None]:
val fireTSDF = newFireDF.withColumn("IncidentDate", to_timestamp(col("CallDate"), "MM/dd/yyyy")).drop("CallDate") .withColumn("OnWatchDate", to_timestamp(col("WatchDate"), "MM/dd/yyyy")).drop("WatchDate") .withColumn("AvailableDtTS", to_timestamp(col("AvailableDtTm"), "MM/dd/yyyy hh:mm:ss a")).drop("AvailableDtTm")

fireTSDF.cache()

### Question 4
**Quels sont les types d'appels les plus courants?**

In [None]:
//Reponse 4
val topCallTypes = newFireDF
    .groupBy("CallType")
    .count()
    .orderBy(desc("count"))
//------------------
topCallTypes.show()

### Question 5-a
Question Quels sont boites postaux rencontrés dans les appels les plus courants?5-a
****

In [None]:
//Reponse 5-a
newFireDF
    .groupBy("CallType","Zipcode")
    .count()
    .orderBy($"count".desc)
    .select("Zipcode","CallType","count")
    .limit(5)
    .show()


### Question 5-b
**Quels sont les quartiers de San Francisco dont les codes postaux sont 94102 et 94103?**

In [None]:
//Reponse 5-b
val neighborhoods = newFireDF
    .filter(col("Zipcode")
    .isin("94102", "94103"))
    .select("Neighborhood", "Zipcode")
    .distinct()

neighborhoods.show()


### Question 6
**Determiner le nombre total d'appels, ainsi que la moyenne, le minimum et le maximum du temps de réponse des appels?**

In [None]:
//Reponse 6

val statsDF = newFireDF
.agg(
    count("*").alias("TotalCalls"), 
    avg("ResponseDelayedinMins").alias("AvgResponseTime"),
    min("ResponseDelayedinMins").alias("MinResponseTime"),
    max("ResponseDelayedinMins").alias("MaxResponseTime")
)

statsDF.select("TotalCalls", "AvgResponseTime", "MinResponseTime", "MaxResponseTime").show()


### Question 7-a
**Combien d'années distinctes trouve t-on dans ce Dataset?**  
Dans ce dataset nous avons des données comprises entre 2000-2018. Vous pouvez utilisez la fonction Spark `year()` pour les dates en Timestamp

In [None]:
//Reponse 7-a
val distinctYears = fireTSDF
    .select(year($"IncidentDate")
    .alias("Year"))
    .distinct()
    .orderBy("Year")

val countDistinctYears = distinctYears.count()
//Long 1

### Question 7-b
**Quelle semaine de l'année 2018 a eu le plus d'appels d'incendie?**

In [None]:
//Reponse 7-b

//On filtre pour recuperer les donnees de l'annee 2018 uniquement
val fire2018DF = fireTSDF
    .filter(year($"IncidentDate") === 2018)

//On regroupe par semaine de l'annee et on compte le nombre d'appels par semaine, puis on trie par ordre decroissant de nombre d'appels
val callsByWeekDF = fire2018DF
    .groupBy(weekofyear($"IncidentDate")
    .alias("Week"))
    .count()
    .orderBy(desc("count"))

//On récupère le numero de semaine correspondant au plus grand nombre d'appels, qui correspond au premier element du DataFrame trié par ordre decroissant de nombre d'appels
val mostCallsWeek = callsByWeekDF
    .first()
    .getAs[Int]("Week")
    
//Afficher le resultat
println(s"La semaine avec le plus d'appels incendie en 2018 est la ${mostCallsWeek}e semaine de cette année.")

### Question 8
**Quels sont les quartiers de San Francisco qui ont connu le pire temps de réponse en 2018?**

In [None]:
//Reponse 8
val responseTimeByNeighborhoodDF = fireTSDF.filter(year($"IncidentDate") === 2018)
    .groupBy("Neighborhood")
    .agg(avg("ResponseDelayedinMins").alias("AvgResponseTime"))
    .orderBy(desc("AvgResponseTime"))

responseTimeByNeighborhoodDF.show()

### Question 9

**Comment stocker les données du Dataframe sous format de fichiers Parquet?**

In [None]:
//Reponse 9
fireTSDF.write.format("parquet").save("C:/datasets/parquet_format")

### Question 10
**Comment relire les données stockée en format Parquet?**

In [None]:
//Reponse 10
val parquetDF = spark.read.parquet("C:/datasets/parquet_format")


parquetDF.show()

## FIN