# Big Data avec Spark : Projet Final

*`Nom : Lamane DIENG`*

## Problematique

Ce projet consiste à utiliser Apache Spark pour faire l'analyse et le traitement des données de **[San Francisco Fire Department Calls ](https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3)** afin de fournir quelques KPI (*Key Performance Indicator*). Le **SF Fire Datasets** comprend les réponses aux appels de toutes les unités d'incendie. Chaque enregistrement comprend le numéro d'appel, le numéro d'incident, l'adresse, l'identifiant de l'unité, le type d'appel et la disposition. Tous les intervalles de temps pertinents sont également inclus. Étant donné que ce Dataset est basé sur les réponses et que la plupart des appels impliquent plusieurs unités, ainsi il existe plusieurs enregistrements pour chaque numéro d'appel. Les adresses sont associées à un numéro de bloc, à une intersection ou à une boîte d'appel, et non à une adresse spécifique.

**Plus de details sur la description des données [ici](https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3)**

## Travail à faire.
L'objectif de ce projet est de comprendre le Dataset SF Fire afin de bien répondre aux questions en utilisant les codes Spark/Scala adéquats.

- Code lisible et bien indenté, 
- N'oublier pas de mettre en commentaire la justification de votre réponse sur les cellule Markdown. 


## Q1. Importez les modules Spark necessaires

In [1]:
// Your code goes here
import $ivy.`org.apache.spark::spark-sql:2.4.5` // Or use any other 2.x version here
import $ivy.`sh.almond::almond-spark:0.10.0`
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql._

Logger.getLogger("org").setLevel(Level.OFF)

[32mimport [39m[36m$ivy.$                                   // Or use any other 2.x version here
[39m
[32mimport [39m[36m$ivy.$                               
[39m
[32mimport [39m[36morg.apache.log4j.{Level, Logger}
[39m
[32mimport [39m[36morg.apache.spark.sql._

[39m

## Q2. Creez la Spark Session

In [2]:
// Your code goes here
import org.apache.log4j.{Level, Logger}
import java.util.Properties

Logger.getLogger("org.spark-project").setLevel(Level.WARN)

[32mimport [39m[36morg.apache.log4j.{Level, Logger}
[39m
[32mimport [39m[36mjava.util.Properties

[39m

In [3]:
val spark = {
  NotebookSparkSession.builder()
    .master("local[4]")
    .config("spark.testing.memory", "2147480000")
    .getOrCreate()
}

Loading spark-stubs
Getting spark JARs
Creating SparkSession


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties


[36mspark[39m: [32mSparkSession[39m = org.apache.spark.sql.SparkSession@62cc94

In [4]:
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions.countDistinct
import org.apache.spark.storage.StorageLevel._
import org.apache.spark.sql.types.DateType
import org.apache.spark.sql.functions.{col, to_date, to_timestamp}
import org.apache.spark.sql.functions.count
import org.apache.spark.sql.functions._

[32mimport [39m[36morg.apache.spark.sql.types._
[39m
[32mimport [39m[36morg.apache.spark.sql.functions.countDistinct
[39m
[32mimport [39m[36morg.apache.spark.storage.StorageLevel._
[39m
[32mimport [39m[36morg.apache.spark.sql.types.DateType
[39m
[32mimport [39m[36morg.apache.spark.sql.functions.{col, to_date, to_timestamp}
[39m
[32mimport [39m[36morg.apache.spark.sql.functions.count
[39m
[32mimport [39m[36morg.apache.spark.sql.functions._[39m

## Q3. Chargez les données

In [5]:
val file = "data/sf-fire-calls.csv"
val fireSchema = StructType(Array(StructField("CallNumber", IntegerType, true),
  StructField("UnitID", StringType, true),
  StructField("IncidentNumber", IntegerType, true),
  StructField("CallType", StringType, true),                  
  StructField("CallDate", StringType, true),      
  StructField("WatchDate", StringType, true),
  StructField("CallFinalDisposition", StringType, true),
  StructField("AvailableDtTm", StringType, true),
  StructField("Address", StringType, true),       
  StructField("City", StringType, true),       
  StructField("Zipcode", IntegerType, true),       
  StructField("Battalion", StringType, true),                 
  StructField("StationArea", StringType, true),       
  StructField("Box", StringType, true),       
  StructField("OriginalPriority", StringType, true),       
  StructField("Priority", StringType, true),       
  StructField("FinalPriority", IntegerType, true),       
  StructField("ALSUnit", BooleanType, true),       
  StructField("CallTypeGroup", StringType, true),
  StructField("NumAlarms", IntegerType, true),
  StructField("UnitType", StringType, true),
  StructField("UnitSequenceInCallDispatch", IntegerType, true),
  StructField("FirePreventionDistrict", StringType, true),
  StructField("SupervisorDistrict", StringType, true),
  StructField("Neighborhood", StringType, true),
  StructField("Location", StringType, true),
  StructField("RowID", StringType, true),
  StructField("Delay", FloatType, true)))


val DataFire = spark.read
                    .option("header","true")
                    .schema(fireSchema)
                    .csv(file)
 

[36mfile[39m: [32mString[39m = [32m"data/sf-fire-calls.csv"[39m
[36mfireSchema[39m: [32mStructType[39m = [33mStructType[39m(
  [33mStructField[39m([32m"CallNumber"[39m, IntegerType, true, {}),
  [33mStructField[39m([32m"UnitID"[39m, StringType, true, {}),
  [33mStructField[39m([32m"IncidentNumber"[39m, IntegerType, true, {}),
  [33mStructField[39m([32m"CallType"[39m, StringType, true, {}),
  [33mStructField[39m([32m"CallDate"[39m, StringType, true, {}),
  [33mStructField[39m([32m"WatchDate"[39m, StringType, true, {}),
  [33mStructField[39m([32m"CallFinalDisposition"[39m, StringType, true, {}),
  [33mStructField[39m([32m"AvailableDtTm"[39m, StringType, true, {}),
  [33mStructField[39m([32m"Address"[39m, StringType, true, {}),
  [33mStructField[39m([32m"City"[39m, StringType, true, {}),
  [33mStructField[39m([32m"Zipcode"[39m, IntegerType, true, {}),
  [33mStructField[39m([32m"Battalion"[39m, StringType, true, {}),
  [33mStruc

## Q4. Mettez en cache les donnees chargees

In [6]:
DataFire.cache()

[36mres5[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]

## Q5. Supprimez tous les appels de type `Medical Incident`

In [7]:
DataFire.where("callType != 'Medical Incident'").show(false)

+----------+------+--------------+-----------------------------+----------+----------+--------------------+----------------------+---------------------------+----+-------+---------+-----------+----+----------------+--------+-------------+-------+-------------+---------+--------+--------------------------+----------------------+------------------+------------------------------+-------------------------------------+-------------+---------+
|CallNumber|UnitID|IncidentNumber|CallType                     |CallDate  |WatchDate |CallFinalDisposition|AvailableDtTm         |Address                    |City|Zipcode|Battalion|StationArea|Box |OriginalPriority|Priority|FinalPriority|ALSUnit|CallTypeGroup|NumAlarms|UnitType|UnitSequenceInCallDispatch|FirePreventionDistrict|SupervisorDistrict|Neighborhood                  |Location                             |RowID        |Delay    |
+----------+------+--------------+-----------------------------+----------+----------+--------------------+---------

|20120295  |B05   |2003756       |Alarms                       |01/12/2002|01/12/2002|Other               |01/12/2002 07:54:42 PM|500 Block of STEINER ST    |SF  |94117  |B05      |21         |3632|3               |3       |3            |false  |null         |1        |CHIEF   |2                         |5                     |5                 |Hayes Valley                  |(37.7742331365027, -122.432492947831)|020120295-B05|2.0833333|
|20120311  |E07   |2003770       |Smoke Investigation (Outside)|01/12/2002|01/12/2002|Other               |01/12/2002 08:44:01 PM|900 Block of SHOTWELL ST   |SF  |94110  |B06      |07         |0544|3               |3       |3            |false  |null         |1        |ENGINE  |1                         |6                     |9                 |Mission                       |(37.7532406253685, -122.415177223195)|020120311-E07|2.1166666|
|20120322  |D2    |2003777       |Structure Fire               |01/12/2002|01/12/2002|Other               |01/12/200

## Q6. Combien de types d'appels distincts ont été passés ?**  

In [8]:
DataFire
        .select(countDistinct("CallType") as "nombre de type d'appel")
        .show(false)

+----------------------+
|nombre de type d'appel|
+----------------------+
|30                    |
+----------------------+



Le nombre de type d'appel est 30

## Q7. Quels types d'appels  ont été passés au service d'incendie?

In [9]:
DataFire
        .select("CallType")
        .distinct()
        .show(30,false)

+--------------------------------------------+
|CallType                                    |
+--------------------------------------------+
|Elevator / Escalator Rescue                 |
|Marine Fire                                 |
|Aircraft Emergency                          |
|Confined Space / Structure Collapse         |
|Administrative                              |
|Alarms                                      |
|Odor (Strange / Unknown)                    |
|Citizen Assist / Service Call               |
|HazMat                                      |
|Watercraft in Distress                      |
|Explosion                                   |
|Oil Spill                                   |
|Vehicle Fire                                |
|Suspicious Package                          |
|Extrication / Entrapped (Machinery, Vehicle)|
|Other                                       |
|Outside Fire                                |
|Traffic Collision                           |
|Assist Polic

## Q8. Trouvez toutes les réponses ou les délais sont supérieurs à 5 minutes

Hint:
1. Renommez la colonne `Delay` -> `ReponseDelayedinMins`
2. Retournez un nouveau DataFrame
3. Affichez tous les appels où le temps de réponse au site d'incendie a eu un retard de plus de 5 minutes

Ici, nous allons créer un nouveau DataFrame(DataFireEdit) en modifiant la colonne Delay en ReponseDelayedinMins

In [10]:
val DataFireEdit=DataFire.withColumnRenamed("Delay","ReponseDelayedinMins")

[36mDataFireEdit[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]

Après la creation de notre nouveau DataFrame, nous allons filtrer tous les appels où le temps de réponse au site d'incendie a eu un retard de plus de 5 minutes

In [11]:
DataFireEdit.filter("ReponseDelayedinMins >= '5'").show(false)

+----------+------+--------------+-----------------------------+----------+----------+--------------------+----------------------+------------------------------+----+-------+---------+-----------+----+----------------+--------+-------------+-------+-------------+---------+--------------+--------------------------+----------------------+------------------+------------------------------+-------------------------------------+-------------+--------------------+
|CallNumber|UnitID|IncidentNumber|CallType                     |CallDate  |WatchDate |CallFinalDisposition|AvailableDtTm         |Address                       |City|Zipcode|Battalion|StationArea|Box |OriginalPriority|Priority|FinalPriority|ALSUnit|CallTypeGroup|NumAlarms|UnitType      |UnitSequenceInCallDispatch|FirePreventionDistrict|SupervisorDistrict|Neighborhood                  |Location                             |RowID        |ReponseDelayedinMins|
+----------+------+--------------+-----------------------------+----------+-

|20180191  |91    |2005353       |Other                        |01/18/2002|01/18/2002|Other               |01/18/2002 12:51:51 PM|0 Block of LECH WALESA ST     |SF  |94102  |B02      |36         |0381|3               |3       |3            |false  |null         |1        |MEDIC         |1                         |2                     |6                 |Tenderloin                    |(37.7779246419176, -122.418936072128)|020180191-91 |7.983333            |
|20180382  |M36   |2005480       |Medical Incident             |01/18/2002|01/18/2002|Other               |01/18/2002 07:33:58 PM|2700 Block of 16TH ST         |SF  |94110  |B02      |07         |5237|2               |2       |2            |true   |null         |1        |MEDIC         |1                         |2                     |9                 |Mission                       |(37.7653263612606, -122.414201573327)|020180382-M36|13.55               |
|20190062  |E14   |2005599       |Medical Incident             |01/19/2002|0

## Q9. Convertissez les colonnes dates en timestamp

Hint:
* `CallDate` -> `IncidentDate`
* `WatchDate` -> `OnWatchDate`
* `AvailableDtTm` -> `AvailableDtTS`
exemple code pour le cas de `CallDate`:
`dataframe.withColumn("IncidentDate", to_timestamp(col("CallDate"), "MM/dd/yyyy")).drop("CallDate")`

Les colonnes CallDate, WatchDate, et AvailableDtTm sont de types string.
Nous allons trois nouvelles colonnes:IncidentDate, OnwatchDate, et AvailableDtTs qui vont respectivement remplacés CallDate, WatchDate, et AvailableDtTm

In [12]:
val Data1=DataFireEdit.withColumn("IncidentDate", to_timestamp(col("CallDate"), "MM/dd/yyyy")).drop("CallDate")
val Data2=Data1.withColumn("OnWatchDate", to_timestamp(col("WatchDate"), "MM/dd/yyyy")).drop("WatchDate")
val newDataFire=Data2.withColumn("AvailableDtTS", to_timestamp(col("AvailableDtTm"), "MM/dd/yyyy")).drop("AvailableDtTm")

[36mData1[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]
[36mData2[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]
[36mnewDataFire[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]

In [13]:
newDataFire.printSchema()

root
 |-- CallNumber: integer (nullable = true)
 |-- UnitID: string (nullable = true)
 |-- IncidentNumber: integer (nullable = true)
 |-- CallType: string (nullable = true)
 |-- CallFinalDisposition: string (nullable = true)
 |-- Address: string (nullable = true)
 |-- City: string (nullable = true)
 |-- Zipcode: integer (nullable = true)
 |-- Battalion: string (nullable = true)
 |-- StationArea: string (nullable = true)
 |-- Box: string (nullable = true)
 |-- OriginalPriority: string (nullable = true)
 |-- Priority: string (nullable = true)
 |-- FinalPriority: integer (nullable = true)
 |-- ALSUnit: boolean (nullable = true)
 |-- CallTypeGroup: string (nullable = true)
 |-- NumAlarms: integer (nullable = true)
 |-- UnitType: string (nullable = true)
 |-- UnitSequenceInCallDispatch: integer (nullable = true)
 |-- FirePreventionDistrict: string (nullable = true)
 |-- SupervisorDistrict: string (nullable = true)
 |-- Neighborhood: string (nullable = true)
 |-- Location: string (nullable =

En faisant **newDataFire.printSchema()**, on voit que les trois colonnes CallDate, WatchDate, et AvailableDtTm ont été respectivement remplacés par IncidentDate, OnwatchDate, et AvailableDtTs en modifiant leurs types (String->timestamp)

## Q10. Quels sont les types d'appels les plus courants?

In [14]:
import spark.implicits._
newDataFire
            .groupBy("CallType")
            .count()
            .orderBy($"count".desc)
            .show(truncate=false)

+-------------------------------+------+
|CallType                       |count |
+-------------------------------+------+
|Medical Incident               |113794|
|Structure Fire                 |23319 |
|Alarms                         |19406 |
|Traffic Collision              |7013  |
|Citizen Assist / Service Call  |2524  |
|Other                          |2166  |
|Outside Fire                   |2094  |
|Vehicle Fire                   |854   |
|Gas Leak (Natural and LP Gases)|764   |
|Water Rescue                   |755   |
|Odor (Strange / Unknown)       |490   |
|Electrical Hazard              |482   |
|Elevator / Escalator Rescue    |453   |
|Smoke Investigation (Outside)  |391   |
|Fuel Spill                     |193   |
|HazMat                         |124   |
|Industrial Accidents           |94    |
|Explosion                      |89    |
|Train / Rail Incident          |57    |
|Aircraft Emergency             |36    |
+-------------------------------+------+
only showing top

[32mimport [39m[36mspark.implicits._
[39m

La sortie nous montre que les types d'appels les plus courants sont:**Medical incident**,**Structure Fire** et **Alarms**

## Q11. Quels sont les boites postales rencontrées dans les appels les plus courants?

In [15]:
val MI=newDataFire
                 .select("CallType","Box")
                 .where("CallType == 'Medical Incident'")
val SF=newDataFire
                 .select("CallType","Box")
                 .where("CallType == 'Structure Fire'")
val AL=newDataFire
                 .select("CallType","Box")
                 .where("CallType == 'Alarms'")
MI.show()
SF.show()
AL.show()

+----------------+----+
|        CallType| Box|
+----------------+----+
|Medical Incident|6495|
|Medical Incident|1455|
|Medical Incident|3513|
|Medical Incident|5415|
|Medical Incident|5525|
|Medical Incident|1557|
|Medical Incident|7173|
|Medical Incident|8635|
|Medical Incident|7145|
|Medical Incident|7145|
|Medical Incident|1311|
|Medical Incident|1153|
|Medical Incident|8734|
|Medical Incident|6244|
|Medical Incident|7113|
|Medical Incident|4252|
|Medical Incident|5226|
|Medical Incident|4256|
|Medical Incident|3361|
|Medical Incident|5246|
+----------------+----+
only showing top 20 rows



+--------------+----+
|      CallType| Box|
+--------------+----+
|Structure Fire|3362|
|Structure Fire|2122|
|Structure Fire|6218|
|Structure Fire|5472|
|Structure Fire|5472|
|Structure Fire|5472|
|Structure Fire|1544|
|Structure Fire|2335|
|Structure Fire|1554|
|Structure Fire|1455|
|Structure Fire|5467|
|Structure Fire|7214|
|Structure Fire|7214|
|Structure Fire|6641|
|Structure Fire|6213|
|Structure Fire|7475|
|Structure Fire|1646|
|Structure Fire|5472|
|Structure Fire|7252|
|Structure Fire|7252|
+--------------+----+
only showing top 20 rows



+--------+----+
|CallType| Box|
+--------+----+
|  Alarms|3223|
|  Alarms|8324|
|  Alarms|3114|
|  Alarms|5278|
|  Alarms|3632|
|  Alarms|1544|
|  Alarms|1544|
|  Alarms|3523|
|  Alarms|2211|
|  Alarms|4242|
|  Alarms|2178|
|  Alarms|3545|
|  Alarms|1225|
|  Alarms|2318|
|  Alarms|2143|
|  Alarms|2252|
|  Alarms|1165|
|  Alarms|2253|
|  Alarms|1246|
|  Alarms|1647|
+--------+----+
only showing top 20 rows



[36mMI[39m: [32mDataset[39m[[32mRow[39m] = [CallType: string, Box: string]
[36mSF[39m: [32mDataset[39m[[32mRow[39m] = [CallType: string, Box: string]
[36mAL[39m: [32mDataset[39m[[32mRow[39m] = [CallType: string, Box: string]

## Q12. Quels sont les quartiers de San Francisco dont les codes postaux sont `94102` et `94103`?**

In [16]:
newDataFire.select(
    "Zipcode",
    "Neighborhood"
).where(($"Zipcode" === 94102 || $"Zipcode" === 94103) && $"City"==="San Francisco")
.distinct().show(truncate=false)

+-------+------------------------------+
|Zipcode|Neighborhood                  |
+-------+------------------------------+
|94103  |South of Market               |
|94102  |Financial District/South Beach|
|94102  |South of Market               |
|94103  |Castro/Upper Market           |
|94102  |Hayes Valley                  |
|94103  |Financial District/South Beach|
|94102  |Tenderloin                    |
|94102  |Nob Hill                      |
|94103  |Mission                       |
|94103  |Mission Bay                   |
|94102  |Western Addition              |
|94103  |Hayes Valley                  |
|94103  |Tenderloin                    |
|94103  |Potrero Hill                  |
+-------+------------------------------+



## Q13. Determinez le nombre total d'appels, ainsi que la moyenne, le minimum et le maximum du temps de réponse des appels?

In [17]:
newDataFire.select("ReponseDelayedinMins").describe().show

+-------+--------------------+
|summary|ReponseDelayedinMins|
+-------+--------------------+
|  count|              175296|
|   mean|   3.892364154521585|
| stddev|   9.378286226254206|
|    min|         0.016666668|
|    max|             1844.55|
+-------+--------------------+



## Q14. Combien d'années distinctes trouve t-on dans ce Dataset? 

In [18]:
newDataFire
            .select((year($"IncidentDate")) as "Les differentes Annees")
            .distinct()
            .orderBy(year($"IncidentDate").asc)
            .show(19)

+----------------------+
|Les differentes Annees|
+----------------------+
|                  2000|
|                  2001|
|                  2002|
|                  2003|
|                  2004|
|                  2005|
|                  2006|
|                  2007|
|                  2008|
|                  2009|
|                  2010|
|                  2011|
|                  2012|
|                  2013|
|                  2014|
|                  2015|
|                  2016|
|                  2017|
|                  2018|
+----------------------+



In [19]:
newDataFire.select(countDistinct(year($"IncidentDate")) as "nombre d'annee").show()

+--------------+
|nombre d'annee|
+--------------+
|            19|
+--------------+



## Q15. Quelle semaine de l'année 2018 a eu le plus d'appels d'incendie?

In [20]:
newDataFire
          .select(weekofyear(to_date($"IncidentDate")) as "semaine")
          .where($"CallType"==="Structure Fire")
          .where(year($"IncidentDate") === 2018)
          .groupBy($"semaine")
          .count()
          .orderBy($"count".desc)
          .show(2,truncate=false)

+-------+-----+
|semaine|count|
+-------+-----+
|25     |31   |
|8      |30   |
+-------+-----+
only showing top 2 rows



Ici, on a juste affiché les 2 seamines de l'année 2018 qui ont eu le plus d'appels d'incendie
Et on constate que la semaine **25** a eu le plus d'appel

## Q16. Quels sont les quartiers de San Francisco qui ont connu le pire temps de réponse en 2018?

In [21]:
// Your code goes here
newDataFire
          .select("Neighborhood","ReponseDelayedinMins")
          .where($"City"==="San Francisco")
          .where(year($"IncidentDate") === 2018)
          .orderBy($"ReponseDelayedinMins".desc)
          .show(truncate=false)

+------------------------------+--------------------+
|Neighborhood                  |ReponseDelayedinMins|
+------------------------------+--------------------+
|Chinatown                     |491.26666           |
|Financial District/South Beach|406.63333           |
|Tenderloin                    |340.48334           |
|Haight Ashbury                |175.86667           |
|Bayview Hunters Point         |155.8               |
|Financial District/South Beach|135.51666           |
|Pacific Heights               |129.01666           |
|Potrero Hill                  |109.8               |
|Inner Sunset                  |106.13333           |
|South of Market               |94.71667            |
|Bayview Hunters Point         |92.816666           |
|South of Market               |91.666664           |
|Inner Richmond                |90.433334           |
|Excelsior                     |83.76667            |
|South of Market               |76.9                |
|Tenderloin                 

Les trois quartiers qui ont connu le pire temps de réponse en 2018 sont:**Chinatown**, **Financial District/South Beach** et **Tenderloin**

## Q17. Stocker les données sous format de fichiers Parquet

In [23]:
val outputPath = "data/sf-fire-calls.parquet"

newDataFire.write.mode("overwrite").parquet(outputPath)

[36moutputPath[39m: [32mString[39m = [32m"data/sf-fire-calls.parquet"[39m

## Q18. Rechargez  les données stockées en format Parquet

In [24]:
val Base=spark.read.parquet(outputPath)
Base.printSchema()

root
 |-- CallNumber: integer (nullable = true)
 |-- UnitID: string (nullable = true)
 |-- IncidentNumber: integer (nullable = true)
 |-- CallType: string (nullable = true)
 |-- CallFinalDisposition: string (nullable = true)
 |-- Address: string (nullable = true)
 |-- City: string (nullable = true)
 |-- Zipcode: integer (nullable = true)
 |-- Battalion: string (nullable = true)
 |-- StationArea: string (nullable = true)
 |-- Box: string (nullable = true)
 |-- OriginalPriority: string (nullable = true)
 |-- Priority: string (nullable = true)
 |-- FinalPriority: integer (nullable = true)
 |-- ALSUnit: boolean (nullable = true)
 |-- CallTypeGroup: string (nullable = true)
 |-- NumAlarms: integer (nullable = true)
 |-- UnitType: string (nullable = true)
 |-- UnitSequenceInCallDispatch: integer (nullable = true)
 |-- FirePreventionDistrict: string (nullable = true)
 |-- SupervisorDistrict: string (nullable = true)
 |-- Neighborhood: string (nullable = true)
 |-- Location: string (nullable =

[36mBase[39m: [32mDataFrame[39m = [CallNumber: int, UnitID: string ... 26 more fields]