# Big Data avec Spark : Projet Final

*`Nom : Mouhamadane MBOUP`*

## Problematique

Ce projet consiste à utiliser Apache Spark pour faire l'analyse et le traitement des données de **[San Francisco Fire Department Calls ](https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3)** afin de fournir quelques KPI (*Key Performance Indicator*). Le **SF Fire Datasets** comprend les réponses aux appels de toutes les unités d'incendie. Chaque enregistrement comprend le numéro d'appel, le numéro d'incident, l'adresse, l'identifiant de l'unité, le type d'appel et la disposition. Tous les intervalles de temps pertinents sont également inclus. Étant donné que ce Dataset est basé sur les réponses et que la plupart des appels impliquent plusieurs unités, ainsi il existe plusieurs enregistrements pour chaque numéro d'appel. Les adresses sont associées à un numéro de bloc, à une intersection ou à une boîte d'appel, et non à une adresse spécifique.

**Plus de details sur la description des données [ici](https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3)**

## Travail à faire.
L'objectif de ce projet est de comprendre le Dataset SF Fire afin de bien répondre aux questions en utilisant les codes Spark/Scala adéquats.

- Code lisible et bien indenté, 
- N'oublier pas de mettre en commentaire la justification de votre réponse sur les cellule Markdown. 


#### Note:
- Le projet est personnel (un étudiant) . 
- Deadline : **Jeudi 22 janvier 2021**

## Q1. Importez les modules Spark necessaires

In [1]:
import $ivy.`org.apache.spark::spark-sql:2.4.5`
import $ivy.`sh.almond::almond-spark:0.10.9`
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql._
import  spark.implicits._

Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.12/2.4.5/spark-sql_2.12-2.4.5.pom
Downloading https://repo1.maven.org/maven2/sh/almond/almond-spark_2.12/0.10.9/almond-spark_2.12-0.10.9.pom
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.12/2.4.5/spark-sql_2.12-2.4.5.pom
Downloaded https://repo1.maven.org/maven2/sh/almond/almond-spark_2.12/0.10.9/almond-spark_2.12-0.10.9.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.12/2.4.5/spark-parent_2.12-2.4.5.pom
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.12/2.4.5/spark-parent_2.12-2.4.5.pom
Downloading https://repo1.maven.org/maven2/org/apache/apache/18/apache-18.pom
Downloaded https://repo1.maven.org/maven2/org/apache/apache/18/apache-18.pom
Downloading https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.12/2.4.5/spark-catalyst_

Downloading https://repo1.maven.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.pom
Downloading https://repo1.maven.org/maven2/oro/oro/2.0.8/oro-2.0.8.pom
Downloaded https://repo1.maven.org/maven2/org/apache/avro/avro/1.8.2/avro-1.8.2.pom
Downloading https://repo1.maven.org/maven2/net/razorvine/pyrolite/4.13/pyrolite-4.13.pom
Downloaded https://repo1.maven.org/maven2/io/netty/netty-all/4.1.42.Final/netty-all-4.1.42.Final.pom
Downloading https://repo1.maven.org/maven2/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.pom
Downloaded https://repo1.maven.org/maven2/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5.pom
Downloaded https://repo1.maven.org/maven2/commons-net/commons-net/3.1/commons-net-3.1.pom
Downloading https://repo1.maven.org/maven2/joda-time/joda-time/2.9.9/joda-time-2.9.9.pom
Downloaded https://repo1.maven.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.pom
Downloaded https://repo1.maven.org/maven2/net/razorvine/pyrolite/4.13/pyro

Downloaded https://repo1.maven.org/maven2/commons-codec/commons-codec/1.10/commons-codec-1.10.pom
Downloading https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.pom
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-kvstore_2.12/2.4.5/spark-kvstore_2.12-2.4.5.pom
Downloading https://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2.pom
Downloaded https://repo1.maven.org/maven2/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.pom
Downloading https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.pom
Downloaded https://repo1.maven.org/maven2/com/twitter/chill_2.12/0.9.3/chill_2.12-0.9.3.pom
Downloading https://repo1.maven.org/maven2/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.pom
Downloaded https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.pom
Downloading https://repo1.maven.org/maven2/org/apache/parquet/parquet-jackson/1.10.1/parquet-jacks

Downloaded https://repo1.maven.org/maven2/org/roaringbitmap/RoaringBitmapParent/0.7.45/RoaringBitmapParent-0.7.45.pom
Downloaded https://repo1.maven.org/maven2/org/glassfish/jersey/project/2.22.2/project-2.22.2.pom
Downloaded https://repo1.maven.org/maven2/org/apache/apache/7/apache-7.pom
Downloaded https://repo1.maven.org/maven2/org/antlr/antlr4-master/4.7/antlr4-master-4.7.pom
Downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-parent/1.7.16/slf4j-parent-1.7.16.pom
Downloading https://repo1.maven.org/maven2/org/apache/commons/commons-parent/40/commons-parent-40.pom
Downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/jackson-parent/2.6.1/jackson-parent-2.6.1.pom
Downloaded https://repo1.maven.org/maven2/org/slf4j/slf4j-parent/1.7.16/slf4j-parent-1.7.16.pom
Downloaded https://repo1.maven.org/maven2/com/fasterxml/jackson/jackson-parent/2.6.1/jackson-parent-2.6.1.pom
Downloaded https://repo1.maven.org/maven2/org/apache/commons/commons-parent/40/commons-parent-40.pom
D

Downloading https://repo1.maven.org/maven2/org/roaringbitmap/shims/0.7.45/shims-0.7.45.pom
Downloading https://repo1.maven.org/maven2/org/tukaani/xz/1.5/xz-1.5.pom
Downloaded https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7.pom
Downloading https://repo1.maven.org/maven2/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.pom
Downloaded https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-io/9.4.30.v20200611/jetty-io-9.4.30.v20200611.pom
Downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-yarn-api/2.6.5/hadoop-yarn-api-2.6.5.pom
Downloaded https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-http/9.4.30.v20200611/jetty-http-9.4.30.v20200611.pom
Downloaded https://repo1.maven.org/maven2/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.pom
Downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/module/jackson-module-paranamer/2.7.9/jackson-module-paraname

Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.5/hadoop-mapreduce-client-shuffle-2.6.5.pom
Downloading https://repo1.maven.org/maven2/xmlenc/xmlenc/0.52/xmlenc-0.52.pom
Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.pom
Downloaded https://repo1.maven.org/maven2/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.pom
Downloaded https://repo1.maven.org/maven2/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.pom
Downloading https://repo1.maven.org/maven2/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.pom
Downloading https://repo1.maven.org/maven2/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.pom
Downloading https://repo1.maven.org/maven2/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.pom
Downloaded https://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.pom
Downloading https://repo1.maven.org/maven2/o

Downloading https://repo1.maven.org/maven2/org/codehaus/jettison/jettison/1.1/jettison-1.1.pom
Downloaded https://repo1.maven.org/maven2/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.pom
Downloading https://repo1.maven.org/maven2/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.pom
Downloaded https://repo1.maven.org/maven2/jline/jline/0.9.94/jline-0.9.94.pom
Downloaded https://repo1.maven.org/maven2/com/google/inject/guice/3.0/guice-3.0.pom
Downloaded https://repo1.maven.org/maven2/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.pom
Downloading https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.pom
Downloaded https://repo1.maven.org/maven2/org/codehaus/jettison/jettison/1.1/jettison-1.1.pom
Downloaded https://repo1.maven.org/maven2/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.pom
Downloaded https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.7.0/common

Downloaded https://repo1.maven.org/maven2/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/arrow/arrow-vector/0.10.0/arrow-vector-0.10.0.jar
Downloaded https://repo1.maven.org/maven2/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sketch_2.12/2.4.5/spark-sketch_2.12-2.4.5-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.5/hadoop-mapreduce-client-core-2.6.5.jar
Downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-sketch_2.12/2.4.5/spark-sketch_2.12-2.4.5-sources.jar
Downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar
Downloaded https://repo1.maven.org/maven2/org/codehaus/jackso

Downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar
Downloaded https://repo1.maven.org/maven2/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16-sources.jar
Downloading https://repo1.maven.org/maven2/sh/almond/ammonite-spark_2.12/0.10.1/ammonite-spark_2.12-0.10.1.jar
Downloaded https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar
Downloading https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-util/9.4.30.v20200611/jetty-util-9.4.30.v20200611-sources.jar
Downloaded https://repo1.maven.org/maven2/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3-sources.jar
Downloading https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0-sources.jar
Downloaded https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-network-common_2.12/2.4.5/spark-network-common_2.12-2.4.5-sources.jar
Downloaded https://repo1.mave

Downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.6.5/hadoop-yarn-common-2.6.5-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.6.5/hadoop-common-2.6.5.jar
Downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-network-common_2.12/2.4.5/spark-network-common_2.12-2.4.5.jar
Downloaded https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5-sources.jar
Downloading https://repo1.maven.org/maven2/o

Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-kvstore_2.12/2.4.5/spark-kvstore_2.12-2.4.5-sources.jar
Downloading https://repo1.maven.org/maven2/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-yarn-server-nodemanager/2.6.5/hadoop-yarn-server-nodemanager-2.6.5.jar
Downloading https://repo1.maven.org/maven2/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5-sources.jar
Downloaded https://repo1.maven.org/maven2/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/xbean/xbean-asm6-shaded/4.8/xbean-asm6-shaded-4.8-sources.jar
Downloaded https://repo1.maven.org/maven2/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar
Downloaded https:/

Downloading https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-http/9.4.30.v20200611/jetty-http-9.4.30.v20200611-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-yarn-api/2.6.5/hadoop-yarn-api-2.6.5.jar
Downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/module/jackson-module-scala_2.12/2.6.7.1/jackson-module-scala_2.12-2.6.7.1-sources.jar
Downloaded https://repo1.maven.org/maven2/com/fasterxml/jackson/module/jackson-module-scala_2.12/2.6.7.1/jackson-module-scala_2.12-2.6.7.1-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.12/2.4.5/spark-catalyst_2.12-2.4.5-sources.jar
Downloaded https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-http/9.4.30.v20200611/jetty-http-9.4.30.v20200611-sources.jar
Downloading https://repo1.maven.org/maven2/oro/oro/2.0.8/oro-2.0.8.jar
Downloaded https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-server/9.4.30.v20200611/jetty-server-9.4.30.v20200611.jar
Downloading h

Downloaded https://repo1.maven.org/maven2/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6-sources.jar
Downloaded https://repo1.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45-sources.jar
Downloading https://repo1.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar
Downloaded https://repo1.maven.org/maven2/org/apache/avro/avro/1.8.2/avro-1.8.2.jar
Downloading https://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar
Downloaded https://repo1.maven.org/maven2/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0-sources.jar
Downloading https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1-sources.jar
Downloaded https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/hadoop/h

Downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.6.7.3/jackson-databind-2.6.7.3.jar
Downloaded https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar
Downloading https://repo1.maven.org/maven2/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar
Downloaded https://repo1.maven.org/maven2/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.12/2.4.5/spark-sql_2.12-2.4.5.jar
Downloaded https://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.3/snappy-java-1.1.7.3.jar
Downloading https://repo1.maven.org/maven2/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/parquet/parquet

Downloading https://repo1.maven.org/maven2/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar
Downloading https://repo1.maven.org/maven2/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0-sources.jar
Downloaded https://repo1.maven.org/maven2/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2-sources.jar
Downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.5/hadoop-mapreduce-client-jobclient-2.6.5.jar
Downloaded https://repo1.maven.org/maven2/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar
Downloading https://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.3/snappy-java-1.1.7.3-sources.jar
Downloaded https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/

[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                               [39m

## Q2. Creez la Spark Session

In [2]:
val spark = {
  NotebookSparkSession.builder()
    .master("local[*]")
    .getOrCreate()
}

Loading spark-stubs


Downloading https://repo1.maven.org/maven2/sh/almond/spark-stubs_24_2.12/0.10.1/spark-stubs_24_2.12-0.10.1.pom
Downloaded https://repo1.maven.org/maven2/sh/almond/spark-stubs_24_2.12/0.10.1/spark-stubs_24_2.12-0.10.1.pom
Downloading https://repo1.maven.org/maven2/sh/almond/spark-stubs_24_2.12/0.10.1/spark-stubs_24_2.12-0.10.1.jar
Downloading https://repo1.maven.org/maven2/sh/almond/spark-stubs_24_2.12/0.10.1/spark-stubs_24_2.12-0.10.1-sources.jar
Downloaded https://repo1.maven.org/maven2/sh/almond/spark-stubs_24_2.12/0.10.1/spark-stubs_24_2.12-0.10.1.jar
Downloaded https://repo1.maven.org/maven2/sh/almond/spark-stubs_24_2.12/0.10.1/spark-stubs_24_2.12-0.10.1-sources.jar


Getting spark JARs


log4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


Creating SparkSession


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/01/12 12:19:53 INFO SparkContext: Running Spark version 2.4.5
21/01/12 12:19:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/01/12 12:19:54 INFO SparkContext: Submitted application: 8515def5-d0b2-4810-87f4-8d2f364ee49a
21/01/12 12:19:54 INFO SecurityManager: Changing view acls to: muhammad
21/01/12 12:19:54 INFO SecurityManager: Changing modify acls to: muhammad
21/01/12 12:19:54 INFO SecurityManager: Changing view acls groups to: 
21/01/12 12:19:54 INFO SecurityManager: Changing modify acls groups to: 
21/01/12 12:19:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(muhammad); groups with view permissions: Set(); users  with modify permissions: Set(muhammad); groups with modify permissions: Set()
21/01/12 12:19:54 INFO Utils: Successfully started service

21/01/12 12:19:55 INFO SparkContext: Added JAR file:/home/muhammad/.cache/coursier/v1/https/repo1.maven.org/maven2/io/get-coursier/interface/0.0.25/interface-0.0.25.jar at spark://muhammad:40323/jars/interface-0.0.25.jar with timestamp 1610453995168
21/01/12 12:19:55 INFO SparkContext: Added JAR file:/home/muhammad/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/modules/scala-collection-compat_2.12/2.2.0/scala-collection-compat_2.12-2.2.0-sources.jar at spark://muhammad:40323/jars/scala-collection-compat_2.12-2.2.0-sources.jar with timestamp 1610453995168
21/01/12 12:19:55 INFO SparkContext: Added JAR file:/home/muhammad/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/modules/scala-collection-compat_2.12/2.2.0/scala-collection-compat_2.12-2.2.0.jar at spark://muhammad:40323/jars/scala-collection-compat_2.12-2.2.0.jar with timestamp 1610453995169
21/01/12 12:19:55 INFO SparkContext: Added JAR file:/home/muhammad/.cache/coursier/v1/https/repo1.maven.org/mave

[32mimport [39m[36morg.apache.spark.sql._

[39m
[36mspark[39m: [32mSparkSession[39m = org.apache.spark.sql.SparkSession@52b25cc1

In [20]:
import org.apache.log4j.{Level, Logger}

val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)

Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.spark-project").setLevel(Level.WARN)

[32mimport [39m[36morg.apache.log4j.{Level, Logger}

[39m
[36mrootLogger[39m: [32mLogger[39m = org.apache.log4j.spi.RootLogger@6701f825

## Q3. Chargez les données

Utilisez le `fireSchema` definit dans la cellule suivante pour le chargement.

In [18]:
import org.apache.spark.sql.types._

val fireSchema = StructType(Array(StructField("CallNumber", IntegerType, true),
  StructField("UnitID", StringType, true),
  StructField("IncidentNumber", IntegerType, true),
  StructField("CallType", StringType, true),                  
  StructField("CallDate", StringType, true),      
  StructField("WatchDate", StringType, true),
  StructField("CallFinalDisposition", StringType, true),
  StructField("AvailableDtTm", StringType, true),
  StructField("Address", StringType, true),       
  StructField("City", StringType, true),       
  StructField("Zipcode", IntegerType, true),       
  StructField("Battalion", StringType, true),                 
  StructField("StationArea", StringType, true),       
  StructField("Box", StringType, true),       
  StructField("OriginalPriority", StringType, true),       
  StructField("Priority", StringType, true),       
  StructField("FinalPriority", IntegerType, true),       
  StructField("ALSUnit", BooleanType, true),       
  StructField("CallTypeGroup", StringType, true),
  StructField("NumAlarms", IntegerType, true),
  StructField("UnitType", StringType, true),
  StructField("UnitSequenceInCallDispatch", IntegerType, true),
  StructField("FirePreventionDistrict", StringType, true),
  StructField("SupervisorDistrict", StringType, true),
  StructField("Neighborhood", StringType, true),
  StructField("Location", StringType, true),
  StructField("RowID", StringType, true),
  StructField("Delay", FloatType, true)))

val data = spark.read
        .format("csv")
        .option("header", "true")
        .option("delimiter", ",")
        .schema(fireSchema)
        .load("data/sf-fire-calls.csv")


21/01/12 18:02:35 INFO InMemoryFileIndex: It took 1 ms to list leaf files for 1 paths.


[32mimport [39m[36morg.apache.spark.sql.types._

[39m
[36mfireSchema[39m: [32mStructType[39m = [33mStructType[39m(
  [33mStructField[39m([32m"CallNumber"[39m, IntegerType, true, {}),
  [33mStructField[39m([32m"UnitID"[39m, StringType, true, {}),
  [33mStructField[39m([32m"IncidentNumber"[39m, IntegerType, true, {}),
  [33mStructField[39m([32m"CallType"[39m, StringType, true, {}),
  [33mStructField[39m([32m"CallDate"[39m, StringType, true, {}),
  [33mStructField[39m([32m"WatchDate"[39m, StringType, true, {}),
  [33mStructField[39m([32m"CallFinalDisposition"[39m, StringType, true, {}),
  [33mStructField[39m([32m"AvailableDtTm"[39m, StringType, true, {}),
  [33mStructField[39m([32m"Address"[39m, StringType, true, {}),
  [33mStructField[39m([32m"City"[39m, StringType, true, {}),
  [33mStructField[39m([32m"Zipcode"[39m, IntegerType, true, {}),
  [33mStructField[39m([32m"Battalion"[39m, StringType, true, {}),
  [33mStructField[39m(

## Q4. Mettez en cache les donnees chargees

Hint: `dataframe.cache().count()`

In [21]:
import org.apache.spark.storage.StorageLevel
data.cache
data.persist(StorageLevel.MEMORY_AND_DISK)

21/01/12 18:13:27 WARN CacheManager: Asked to cache already cached data.
21/01/12 18:13:27 WARN CacheManager: Asked to cache already cached data.


[32mimport [39m[36morg.apache.spark.storage.StorageLevel
[39m
[36mres20_1[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]
[36mres20_2[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]

On utilise la mise en cache quand on effectue plusieurs actions sur le même DataFrame. 

## Q5. Supprimez tous les appels de type `Medical Incident`

Hint: appliquez la methode `.filter()` a la colonne `CallType` avec l'operateur `=!=`

In [29]:
val data_test = data.filter(data("CallType") =!= "Medical Incident")
data_test.select("CallType", "Delay").show(2)

+--------------+-----+
|      CallType|Delay|
+--------------+-----+
|Structure Fire| 2.95|
|  Vehicle Fire|  1.5|
+--------------+-----+
only showing top 2 rows



[36mdata_test[39m: [32mDataset[39m[[32mRow[39m] = [CallNumber: int, UnitID: string ... 26 more fields]

## Q6. Combien de types d'appels distincts ont été passés ?**  

In [31]:
val calltype = data.select("CallType").distinct().count()
println(calltype)

30


[36mcalltype[39m: [32mLong[39m = [32m30L[39m

## Q7. Quels types d'appels  ont été passés au service d'incendie?

In [0]:
// Your code goes here

## Q8. Trouvez toutes les réponses ou les délais sont supérieurs à 5 minutes

Hint:
1. Renommez la colonne `Delay` -> `ReponseDelayedinMins`
2. Retournez un nouveau DataFrame
3. Affichez tous les appels où le temps de réponse au site d'incendie a eu un retard de plus de 5 minutes

In [0]:
// Your code goes here

## Q9. Convertissez les colonnes dates en timestamp

Hint:
* `CallDate` -> `IncidentDate`
* `WatchDate` -> `OnWatchDate`
* `AvailableDtTm` -> `AvailableDtTS`
exemple code pour le cas de `CallDate`:
`dataframe.withColumn("IncidentDate", to_timestamp(col("CallDate"), "MM/dd/yyyy")).drop("CallDate")`

In [0]:
// Your code goes here

## Q10. Quels sont les types d'appels les plus courants?

In [0]:
// Your code goes here

## Q11. Quels sont les boites postales rencontrées dans les appels les plus courants?

In [28]:
// Your code goes here

## Q12. Quels sont les quartiers de San Francisco dont les codes postaux sont `94102` et `94103`?**

In [30]:
// Your code goes here

## Q13. Determinez le nombre total d'appels, ainsi que la moyenne, le minimum et le maximum du temps de réponse des appels?

In [0]:
// Your code goes here

## Q14. Combien d'années distinctes trouve t-on dans ce Dataset? 

Hint: Appliquer la fonction `year()` a la colonne `IncidentDate`

In [35]:
// Your code goes here

## Q15. Quelle semaine de l'année 2018 a eu le plus d'appels d'incendie?

In [0]:
// Your code goes here

## Q16. Quels sont les quartiers de San Francisco qui ont connu le pire temps de réponse en 2018?

In [37]:
// Your code goes here

## Q17. Stocker les données sous format de fichiers Parquet

In [0]:
// Your code goes here

## Q18. Rechargez  les données stockées en format Parquet

In [0]:
// Your code goes here