# Working with rows

## Download and install Spark

## Downloading and preprocessing Chicago's Reported Crime Data

In [3]:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate() 

In [8]:
from pyspark.sql.functions import to_timestamp,col,lit
path ="../datasets/sparkbyexamples/police-stations.csv"
rc = spark.read.csv(path,header=True)\
.withColumn('Date',to_timestamp(col('Date'),'MM/dd/yyyy hh:mm:ss a'))\
.filter(col('Date') <= lit('2023-11-11'))
rc.show(5)

+--------+-----------+-------------------+--------------------+----+-----------------+--------------------+--------------------+------+--------+----+--------+----+--------------+--------+------------+------------+----+--------------------+------------+-------------+--------------------+
|      ID|Case Number|               Date|               Block|IUCR|     Primary Type|         Description|Location Description|Arrest|Domestic|Beat|District|Ward|Community Area|FBI Code|X Coordinate|Y Coordinate|Year|          Updated On|    Latitude|    Longitude|            Location|
+--------+-----------+-------------------+--------------------+----+-----------------+--------------------+--------------------+------+--------+----+--------+----+--------------+--------+------------+------------+----+--------------------+------------+-------------+--------------------+
|12592454|   JF113025|2022-01-14 15:55:00|   067XX S MORGAN ST|2826|    OTHER OFFENSE|HARASSMENT BY ELE...|           RESIDENCE| false| 

## Working with rows

**Add the reported crimes for an additional day, 12-Nov-2018, to our dataset.**

In [13]:
rc = spark.read.csv(path,header=True)
rc.select("Date").distinct().show()

+--------------------+
|                Date|
+--------------------+
|08/05/2022 09:26:...|
|01/03/2022 05:00:...|
|02/03/2022 02:30:...|
|12/27/2022 04:35:...|
|11/18/2022 04:00:...|
|11/30/2022 09:45:...|
|12/27/2022 11:21:...|
|07/14/2022 08:52:...|
|08/29/2022 12:00:...|
|09/02/2022 10:00:...|
|03/16/2022 02:00:...|
|08/07/2022 06:00:...|
|08/09/2022 09:00:...|
|09/02/2022 08:15:...|
|04/13/2022 12:00:...|
|01/29/2022 02:00:...|
|03/01/2022 01:15:...|
|01/01/2022 02:00:...|
|02/08/2022 03:27:...|
|02/10/2022 03:50:...|
+--------------------+
only showing top 20 rows



In [9]:
rc.count()

239602

**FILTER DAYS 2022-11-12 and 2022-11-13**

In [17]:
path ="../datasets/sparkbyexamples/police-stations.csv"
one_day = spark.read.csv(path,header=True)\
.withColumn('Date',to_timestamp(col('Date'),'MM/dd/yyyy hh:mm:ss a'))\
.filter(col('Date') == lit('2022-11-12'))
one_day.count()

23

In [18]:
path ="../datasets/sparkbyexamples/police-stations.csv"
one_day2 = spark.read.csv(path,header=True)\
.withColumn('Date',to_timestamp(col('Date'),'MM/dd/yyyy hh:mm:ss a'))\
.filter(col('Date') == lit('2022-11-13'))
one_day2.count()

31

**UNION : APPEND ROWS**

In [19]:
one_day.union(one_day2).orderBy('Date', ascending=False).count()

54

**COUNT and GROUP BY**

In [14]:
rc.groupBy('Primary Type').count().show()

+--------------------+-------+
|        Primary Type|  count|
+--------------------+-------+
|OFFENSE INVOLVING...|  46437|
|CRIMINAL SEXUAL A...|   1081|
|            STALKING|   3388|
|PUBLIC PEACE VIOL...|  47785|
|           OBSCENITY|    585|
|NON-CRIMINAL (SUB...|      9|
|               ARSON|  11158|
|   DOMESTIC VIOLENCE|      1|
|            GAMBLING|  14422|
|   CRIMINAL TRESPASS| 193371|
|             ASSAULT| 418517|
|      NON - CRIMINAL|     38|
|LIQUOR LAW VIOLATION|  14068|
| MOTOR VEHICLE THEFT| 314131|
|               THEFT|1418481|
|             BATTERY|1232265|
|             ROBBERY| 255600|
|            HOMICIDE|   9478|
|           RITUALISM|     23|
|    PUBLIC INDECENCY|    161|
+--------------------+-------+
only showing top 20 rows



**What are the top 10 number of reported crimes by Primary type, in descending order of occurence?**

In [16]:
rc.groupBy('Primary Type').count().orderBy('count', ascending=False).show(10)

+-------------------+-------+
|       Primary Type|  count|
+-------------------+-------+
|              THEFT|1418481|
|            BATTERY|1232265|
|    CRIMINAL DAMAGE| 771507|
|          NARCOTICS| 711758|
|      OTHER OFFENSE| 418890|
|            ASSAULT| 418517|
|           BURGLARY| 388040|
|MOTOR VEHICLE THEFT| 314131|
| DECEPTIVE PRACTICE| 266781|
|            ROBBERY| 255600|
+-------------------+-------+
only showing top 10 rows

