# Introduction
There is a lot of hype about data analysts in the media today, and rightfully so. Many companies are looking to recruit people who can analyse large quantities of data. Over the past years, the data that these companies have managed to collect has grown very quickly. Hidden deep in these vast amounts of data are insights that can help managers make better decisions. As such, companies are turning to data analysts to help then uncover these insights.

Data analysts need two skills. First, they need to write code. This does not mean that they need to be programmers. A common mistake is thinking that data analysts need to come from a computer science background. This is not true. Today, there are many libraries and packages that make the job of the analyst much easier when it comes to writing code. Most courses on data analysis start by providing the student with a background in programming. This course is different. The student will learn the coding as we go along. The course will focus on analytical skills. The coding will be introduced when needed, and even then, we will rely on libraries instead of writing complicated code. 

Another skill that data analysts need is statistics. Unfortunately, it has been my experience that many data analysts do not have a good background in statistics. To them, data analysis is just about plotting graphs and training models. The result is that the work done by these analysts does not make much sense. The only way to understand which tool to use is to have a good grasp of these tools. This requires an understanding of statistics. Fortunately, you do not need to be a mathematician to understand statistics, at least not at the level required of a data analyst. Just like the coding part, this course will introduce the students to the various statistical concepts as we move along. In other words, as the course progresses, we will be talking about coding and statistics at the same time while introducing new concepts when they are needed.

The goal of this course is to provide the begining students with the necessary foundation that will allow them to start analysing data on their own using python. This course assumes no previous knowledge of python, statistics, or data structures. This course will not make you an expert. No single course can take you from A to Z, although many courses claim to do that. Instead, if you are thinking about pursuing a career as a data analyst, then this course is a good place to start. Concepts will be explained in a simple way by relying on examples. Concepts will be introduced on a need to know basis. There is no separate chapter about python. There is no separate chapter about statistics. Instead, the student will learn both at the same time. The further you go in this course, the more you will learn about python and statistics.

# What You Need to Install

In order to follow this course you will need to be able to execute the code. You can do that using <b>Anaconda</b>. You have two options. Either install Anaconda on your computer and work locally, or you can work on the cloud without installing anything. Work on the cloud is the easiest option. You do not need to install or configure anything. All you have to do is go to the website https://www.anaconda.com/code-in-the-cloud, create an account, and start coding. That's all.

If you would rather download what you need and work on your computer, you can also do that easily. Go to https://www.anaconda.com/products/distribution and download and install Anaconda. Once you do that, start Anaconda by searching for "Anaconda" in your computer search bar. The symbol is a green hollow circle. Click on it and wait a little as your computer launches it. You should then see a screen similar to the one shown in the figure below.

![Anaconda navigator](https://drive.google.com/uc?id=1DKR_l-_5FaCwS1GuN1pllFBIwuV-12NF)

In this navigator, launch <b>JupyterLab</b> and wait a few seconds. The notebook should open in your web browser.

# Libraries

On your local computer, you can install libraries by going to the <i>Environments</i> tab in Anaconda navigator, as shown in the figure below.

![Anaconda libraries](https://drive.google.com/uc?id=1Em8HPTjIK7jKVG41UFTqzRhpBbORluuU)

Here you can see a list of installed libraries. As you notice, Anaconda comes with a long list of libraries that are already installed. In the event that you want to use a library that is not installed, you can simply choose the <i>Not installed</i> selection from the drop down box and then search for the library that you wish to install using the search bar on the right. Select that library and then click on <i>Apply</i>. This will install the library. Once a library is installed you can import it and use it in your notebook. We will be imorting the <i>pandas</i> library soon since it is the main library that we will use. To see that thi slibrary is already installed, select the <i>Installed</i> selection in the drop down box in Anaconda navigator and search for pandas. You should see <i>pandas</i> in the list.

# A First Look at the Pandas Library
The reason why you do not need to be a professional programmer to become a data analyst is libraries. Libraries do a lot of the work for you. Over the past few years, many useful libraries have been developed. These libraries have made the job of the data analyst less about programming and more about analyzing the data. 

Let us take a look at our first library, which is one of the most important. This library is called <i>Pandas</i>:

In [1]:
import pandas as pd

In python, whenever you want to use a library, you will have to import it. This is accomplished using the <b>import</b> command. In the above command, we told python that we want to import the <i>Pandas</i> library and that we want to call it <i>pd</i>. We could have just typed <b>import pandas</b> but since whenever we use a library we will have to type its name, it is much easier to use a shorter name.

Once you execute the above code (press control+enter) you will have access to all the useful functions of this library. Let us use one of these useful functions to read a data set.

In [2]:
first_data_set = pd.read_csv("https://data.cityofchicago.org/api/views/9hwr-2zxp/rows.csv?accessType=DOWNLOAD&bom=true&format=true")

The <b>read_csv()</b> function is a very useful function that is part of the pandas library. Notice that to use a function in a library, you will have to type the name of the library (or the short alias that we chose for it), followed by a '.', and then followed by the function that you want to use. In our case the function is read_csv(). From the name, we know that this function allows us to read a data set that is in CSV format. The data set is stored in the web address that we provided to the function as input (inside the brackets). So basically, we are using the pandas library to download a data set that is stored as a CSV file. 

In python, to save something we simply assign it to a variable. In the above command, the read_csv() function is going to return a data set. We saved the data set into the variable <i>first_data_set</i>. We now use this variable to refer to the data set. The data set has been stored as a panda dataframe that is called first_data_set. We can now use many functions associated with panda dataframes in order to look at and to analyse the data. One of these functions is the <b>head()</b> function.

In [3]:
first_data_set.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"


The head() function displays the first five rows of the dataframe. This is useful because as of yet we do not know what the data set contains. Looking at the output above, we see that the data set has 22 columns. We also see the names of each column. Take a look at the column named <i>Primary Type</i>. You can see that the data stored in this column describes crimes (ROBBERY, THEFT, ASSAULT,...etc). It seems that this data set has something to do with crimes. We also see a column called <i>FBI Code</i>. There is also a column named <i>Location</i>. Perhaps this columns provides the location where the crime took place. Maybe.

A very important thing to notice here is that some of the columns have numerical values while others have text values. This is important because different types of columns have different types, and the kind of analysis that we can perform depends on the values of the columns. We can look at the column names and types using the <b>dtypes</b> attribute:

In [4]:
first_data_set.dtypes

ID                        int64
Case Number              object
Date                     object
Block                    object
IUCR                     object
Primary Type             object
Description              object
Location Description     object
Arrest                     bool
Domestic                   bool
Beat                      int64
District                  int64
Ward                    float64
Community Area            int64
FBI Code                 object
X Coordinate            float64
Y Coordinate            float64
Year                      int64
Updated On               object
Latitude                float64
Longitude               float64
Location                 object
dtype: object

Notice that we called dtypes an attribute. As you can see from the command, there are no brackets after <i>dtypes</i>. Functions must be followed by brackets, while attributes are not. The dtypes attribute of the dataframe lists the names of all columns and the type of each column. <i>ID</i> for example is of type <i>int64</i>. This simply means an integer (a number with no decimal points). The <i>Case Number</i> column is of type <i>object</i> which means that it is composed of text. The <i>Arrest</i> column is of type <i>bool</i> which means that it takes on two values, <i>True</i> or <i>False</i>. The <i>X Coordinate</i> column is of type <i>float64</i> which is to say a number with a decimal point.

Let us take another look at the data set, but this time instead of displaying the top five rows, let us display the last five rows so that we can introduce a new function:

In [5]:
first_data_set.tail()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
238312,12936285,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,...,47.0,4,5,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,810,THEFT,OVER $500,STREET,False,False,...,28.0,29,6,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"
238316,12938501,JF523997,12/26/2022 10:30:00 PM,021XX W DEVON AVE,810,THEFT,OVER $500,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,...,50.0,2,6,1160681.0,1942466.0,2022,03/22/2023 04:47:43 PM,41.997825,-87.684267,"(41.997824802, -87.684266677)"


We see that the <i>ID</i> number is an integer, the <i>Case Number</i> is a combination of letters and numbers, the <i>Arrest</i> column takes on the values of True and False, and the <i>X Coordinate</i> column is numerical with a decimal point. We can now understand what different data types are stored in each column.

So far we have looked at the first and last five rows. We now know that the dataframe contains data about crimes. Actually, the data is about crimes in Chicago (https://data.cityofchicago.org), so let us give the dataframe a more meanigful name.

In [6]:
chicago_crimes = first_data_set.copy()

Here, we used the <b>copy()</b> function in order to create a new copy of the dataframe and save it to the variable <i>chicago_crimes</i>. From now on we will use this new variable since the name makes more sense.

# Looking More Closely at the Data

So far we have learned how to use the pandas library to download a data set from the internet, how to look at the column types of the data set, and how to display some of the rows in this data set. Notice that the coding has been very simple. This is because the pandas library makes our life much easier. This is why I said at the start that being a data analyst does not mean that you have to be an expert programmer.

We will now take further steps to understanding what this data set is about. It is useful to get some information about the data set, like the size of the data. We previously saw how to use the dtypes attribute to get the names and types of columns. There is a useful function that will provide us with this information in addition to the number of rows:

In [7]:
chicago_crimes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238317 entries, 0 to 238316
Data columns (total 22 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   ID                    238317 non-null  int64  
 1   Case Number           238317 non-null  object 
 2   Date                  238317 non-null  object 
 3   Block                 238317 non-null  object 
 4   IUCR                  238317 non-null  object 
 5   Primary Type          238317 non-null  object 
 6   Description           238317 non-null  object 
 7   Location Description  237540 non-null  object 
 8   Arrest                238317 non-null  bool   
 9   Domestic              238317 non-null  bool   
 10  Beat                  238317 non-null  int64  
 11  District              238317 non-null  int64  
 12  Ward                  238307 non-null  float64
 13  Community Area        238317 non-null  int64  
 14  FBI Code              238317 non-null  object 
 15  

Let us look at the outout of the <b>info()</b> function. At the top we see that the RangeIndex is 237604 entries. This is the number of rows in the data set. Below that we see the number of columns (22). Below that we see a list of the columns. In this list, we see the number of the column (#), the name of the column (Column), and the type of the column (dtype). We also see something called <i>Non-Null Count</i>. This is an interesting column because it gives us the number of entries in each column that have a non-null value. But what is a null-value? A null-value is simply when the value is empty. It is like having an empty cell in an Excel sheet. It is usually the case that the data that we are dealing with is not complete. Not every single row will have all the information for every single column. Sometimes the information is missing, and the value is just empty, it is a null-value. Therefore, the Non-Null Count is the number of entries in the column that are not null, i.e. not missing. For the column <i>ID</i> for example, we see that there are 237604 non-null values. We know that the total number of rows in the dataframe is 237604. This means that for this column there are no missing values. Now look at the entry for the column <i>X Coordinate</i>. We see that there are 233156 non-null values. This means that there are 237604 - 233156 = 4448 null-values. 

As data analysts, it is very important that we understand how much missing values exist in our data. The above output tells us the number of non-null values. We can calculate the number of null-values by subtracting the total number of rows and the number of null-values, as we did for the column <i>X Coordinate</i>. To make things simpler, we can use pandas functions to calculate the number of missing values for each column, instead of calculating the number of non-missing. To do that, we should first understand the <b>isna()</b> method: 

In [8]:
chicago_crimes.isna()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,False,False,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False
238313,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
238314,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
238315,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


Looking at the output above, we see that all cells are now either <i>True</i> or <i>False</i>. This is because the isna() method displays a False when the cell is not missing (it has a value) and a True if the cell is null, i.e. empty. But how is this useful? Well, it is not useful by itself, but it becomes useful when we combine it with another function:

In [9]:
chicago_crimes.isna().sum()

ID                         0
Case Number                0
Date                       0
Block                      0
IUCR                       0
Primary Type               0
Description                0
Location Description     777
Arrest                     0
Domestic                   0
Beat                       0
District                   0
Ward                      10
Community Area             0
FBI Code                   0
X Coordinate            5199
Y Coordinate            5199
Year                       0
Updated On                 0
Latitude                5199
Longitude               5199
Location                5199
dtype: int64

The <b>sum()</b> function is used to simply add all values in a column together. True is evaluated as 1 and False is evaluated as 0. So when we add the True and False values, we get the number of True values, and True represents a null-value (missing value). Looking at the output we see that for the <i>ID</i> column the sum is 0. This means that all values were False, so we were just adding zeros. This column has no null values. Now look at the column <i>X Cordinate</i>. The sum is 4448. This means that there were 4448 True values after using the isna() function. Each one of these True values was treated as a 1. So when we added them we got the total number of missing values.

As a summary, the isna() function returns the dataframe with True for missing values and False for non-missing values. The sum() method adds all values in each column together, with True being 1 and False being 0. Therefore, the sum for each column gives us the total number of null values in that column. In our case, some columns have no null values, while other columns have some null values.

So now we know how many rows are in the dataframe. We also know how many null values are in each of the columns. Let us now start looking at specific rows or specific columns.

## Zooming in to a specific row
We know by now that the <b>head()</b> function displays the first five rows while the <b>tail()</b> function displays the last five rows. What if I wanted to look at a specific row? Let us look at the tenth row for example.

In [10]:
chicago_crimes.loc[10]

ID                                          12798495
Case Number                                 JF361883
Date                          08/18/2022 06:00:00 PM
Block                                030XX E 80TH ST
IUCR                                            2820
Primary Type                           OTHER OFFENSE
Description                         TELEPHONE THREAT
Location Description                       RESIDENCE
Arrest                                         False
Domestic                                        True
Beat                                             422
District                                           4
Ward                                             7.0
Community Area                                    46
FBI Code                                         08A
X Coordinate                               1197715.0
Y Coordinate                               1852504.0
Year                                            2022
Updated On                    01/03/2023 03:46

The <b>loc[]</b> method allows us to look at a specific row. Here we used the function to retireve the information in row number 10. Notice that panda displays the output vertically because this makes looking at the values much easier. It is important to note that pandas starts numering rows with zero. So the first row is assigned the number 0, not 1. This means that when we look at the row with number 10, we are actually looking at the 11th row. To look at the 10th row we would use the following:

In [11]:
chicago_crimes.loc[9]

ID                                           12793378
Case Number                                  JF355848
Date                           08/13/2022 10:45:00 PM
Block                             118XX S SANGAMON ST
IUCR                                             0560
Primary Type                                  ASSAULT
Description                                    SIMPLE
Location Description                           STREET
Arrest                                          False
Domestic                                         True
Beat                                              524
District                                            5
Ward                                             34.0
Community Area                                     53
FBI Code                                          08A
X Coordinate                                1172121.0
Y Coordinate                                1826361.0
Year                                             2022
Updated On                  

We can even use this method to look at more than one row. Using the ":" we can display the rows numbered 10, 11, and 12:

In [12]:
chicago_crimes.loc[10:12]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
10,12798495,JF361883,08/18/2022 06:00:00 PM,030XX E 80TH ST,2820,OTHER OFFENSE,TELEPHONE THREAT,RESIDENCE,False,True,...,7.0,46,08A,1197715.0,1852504.0,2022,01/03/2023 03:46:28 PM,41.750117,-87.551051,"(41.75011688, -87.551050773)"
11,12796933,JF360155,08/17/2022 07:00:00 AM,001XX E 42ND ST,0820,THEFT,$500 AND UNDER,STREET,False,False,...,3.0,38,06,1177979.0,1877198.0,2022,01/03/2023 03:46:28 PM,41.818349,-87.622623,"(41.81834934, -87.622623461)"
12,12794155,JF356683,08/14/2022 04:57:00 PM,050XX W BELDEN AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,36.0,19,04B,1142223.0,1914820.0,2022,01/03/2023 03:46:28 PM,41.922326,-87.752857,"(41.922325648, -87.75285662)"


Here we used the loc[] method to display the three consecutive rows which are numbered 10, 11, and 12. The ":" is used to tell the method that we want to display a series of rows starting with the number to the left of the ":" and ending with the number to the right of the ":".

## Zooming into a specific column
We can also look at a specific column:

In [13]:
chicago_crimes["Primary Type"]

0                    ROBBERY
1                      THEFT
2                      THEFT
3                    ASSAULT
4                      THEFT
                 ...        
238312    DECEPTIVE PRACTICE
238313       CRIMINAL DAMAGE
238314              BURGLARY
238315                 THEFT
238316                 THEFT
Name: Primary Type, Length: 238317, dtype: object

We can also look at more than one column:

In [14]:
chicago_crimes[["Primary Type", "Description"]]

Unnamed: 0,Primary Type,Description
0,ROBBERY,VEHICULAR HIJACKING
1,THEFT,OVER $500
2,THEFT,OVER $500
3,ASSAULT,SIMPLE
4,THEFT,$500 AND UNDER
...,...,...
238312,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300
238313,CRIMINAL DAMAGE,TO VEHICLE
238314,BURGLARY,UNLAWFUL ENTRY
238315,THEFT,OVER $500


Notice that we used double square brackets. This is a very important point because it has to do with a very important data structure in python, and this structure is arrays.

### Arrays
An array is a collection of items. Certain methods require an input. This input can be one value, or a collection of values. An array is a collection of values. It is how we group items together to use them in a method. When we wanted to display only the column <i>Primary Type</i>, we just typed "Primary Type" between the brackets. But when we want to display more than one column type, we will have to send a collection of the names of the columns. This collection is the array. To collect items inside an array, we use the square brackets []. So to create an array that contains the names of two columns, we need to create an array that contains both names:

In [15]:
my_array = ["Primary Type", "Description"]

We can then use this as the input to display the columns specified above:

In [16]:
chicago_crimes[my_array]

Unnamed: 0,Primary Type,Description
0,ROBBERY,VEHICULAR HIJACKING
1,THEFT,OVER $500
2,THEFT,OVER $500
3,ASSAULT,SIMPLE
4,THEFT,$500 AND UNDER
...,...,...
238312,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300
238313,CRIMINAL DAMAGE,TO VEHICLE
238314,BURGLARY,UNLAWFUL ENTRY
238315,THEFT,OVER $500


If we wanted to do this in one step, i.e. without creating an array, we can do the following:

In [17]:
chicago_crimes[["Primary Type", "Description"]]

Unnamed: 0,Primary Type,Description
0,ROBBERY,VEHICULAR HIJACKING
1,THEFT,OVER $500
2,THEFT,OVER $500
3,ASSAULT,SIMPLE
4,THEFT,$500 AND UNDER
...,...,...
238312,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300
238313,CRIMINAL DAMAGE,TO VEHICLE
238314,BURGLARY,UNLAWFUL ENTRY
238315,THEFT,OVER $500


Let us take another example. If we wanted to display only the column <i>Date</i>, then we can use the following:

In [18]:
chicago_crimes["Date"]

0         08/09/2022 04:07:00 PM
1         08/10/2022 04:00:00 PM
2         08/11/2022 10:00:00 AM
3         08/15/2022 09:14:00 PM
4         08/16/2022 04:10:00 PM
                   ...          
238312    06/27/2022 10:05:00 AM
238313    12/22/2022 06:00:00 PM
238314    12/19/2022 02:00:00 PM
238315    12/20/2022 06:45:00 AM
238316    12/26/2022 10:30:00 PM
Name: Date, Length: 238317, dtype: object

If we also wanted to display the column <i>Block</i> as well, then we need to combine both columns in an array and use that array as the input:

In [19]:
chicago_crimes[["Date", "Block"]]

Unnamed: 0,Date,Block
0,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE
1,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE
2,08/11/2022 10:00:00 AM,094XX S STATE ST
3,08/15/2022 09:14:00 PM,048XX S KARLOV AVE
4,08/16/2022 04:10:00 PM,015XX S HALSTED ST
...,...,...
238312,06/27/2022 10:05:00 AM,025XX N HALSTED ST
238313,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE
238314,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST
238315,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD


## Zooming in to a certain row and column
We can combine all of the above together. Let us say that we want to look at the values of <i>Date</i> and <i>Description</i> in row 13. We can do the following:

In [20]:
chicago_crimes[["Date", "Description"]].loc[13]

Date           08/11/2022 06:00:00 PM
Description      AGGRAVATED - HANDGUN
Name: 13, dtype: object

Now we want to do the same for the rows numbered 15 to 20:

In [21]:
chicago_crimes[["Date", "Description"]].loc[15:20]

Unnamed: 0,Date,Description
15,08/18/2022 02:25:00 PM,OVER $500
16,08/18/2022 10:30:00 PM,SIMPLE
17,08/10/2022 10:55:00 PM,ARMED - HANDGUN
18,08/14/2022 01:45:00 PM,RETAIL THEFT
19,08/14/2022 10:00:00 PM,OVER $500
20,08/13/2022 01:30:00 AM,AUTOMOBILE


Now that we know how to zoom into a column, let us count the number of null values in a specific column. Previously, we saw how to count the number of null values for all columns, like so:

In [22]:
chicago_crimes.isna().sum()

ID                         0
Case Number                0
Date                       0
Block                      0
IUCR                       0
Primary Type               0
Description                0
Location Description     777
Arrest                     0
Domestic                   0
Beat                       0
District                   0
Ward                      10
Community Area             0
FBI Code                   0
X Coordinate            5199
Y Coordinate            5199
Year                       0
Updated On                 0
Latitude                5199
Longitude               5199
Location                5199
dtype: int64

If we were just interested in a certain column, then we can do the following:

In [23]:
chicago_crimes["Location Description"].isna().sum()

777

If we wanted to do this for three columns:

In [24]:
chicago_crimes[["Date", "Location Description", "Ward"]].isna().sum()

Date                      0
Location Description    777
Ward                     10
dtype: int64

This brings us to a very important point. When we choose a subset of columns using the square brackets [], we are still dealing with a dataframe. This means that the functions that we used on the whole dataframe can also be used on a subset of the dataframe. This is why the isna() function worked here. The variable <i>chicago_crimes</i> is a pandas dataframe. The command 
```chicago_crimes[["Date", "Location Description", "Ward"]] ``` also returns a pandas dataframe. If you want to double check this, then create a new variable that we will use to store the subset of the data:

In [25]:
chicago_crimes_subset = chicago_crimes[["Date", "Location Description", "Ward"]]
chicago_crimes_subset.head()

Unnamed: 0,Date,Location Description,Ward
0,08/09/2022 04:07:00 PM,STREET,48.0
1,08/10/2022 04:00:00 PM,STREET,16.0
2,08/11/2022 10:00:00 AM,STREET,9.0
3,08/15/2022 09:14:00 PM,RESIDENCE,14.0
4,08/16/2022 04:10:00 PM,SIDEWALK,11.0


See? A subset of a dataframe is also a dataframe. 

# Removing Rows and Columns
Data analysts do not just look at data, they also need to modify the data. One of the most common things that a data analyst does is to delete rows and columns. Pandas comes with a useful function which allows us to delete either rows or columns. This function is <b>drop()</b>. For example, assume that we want to delete row number 3. In this case, we can simply delete it as follows:

In [26]:
chicago_crimes.drop(3)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,12936285,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,...,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


Notice that we no longer have a row that is numbered 3. Let us now use the <b>head()</b> function to display the first five rows:

In [27]:
chicago_crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"


Row number 3 is still there. What happened? What happened is that we did not "save" the drop action. We did not overwrite the <i>chicago_crimes</i> variable. When we used the ```chicago_crimes.drop(3)``` command, the result was the dataframe without row number 3, but the variable <i>chicago_crimes</i> is still the same. If we wanted to save our work, then we have two options. The first option is the following:

In [28]:
chicago_crimes_new = chicago_crimes.drop(3)
chicago_crimes_new.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,3,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,6,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,6,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,6,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,6,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"


Here, we created a new variable to save the returned dataframe.

Another option is to tell the <b>drop()</b> function to save the action. This is accomplished as follows:

In [29]:
chicago_crimes.drop(3, inplace=True)

In the above command, we are telling <b>drop()</b> to remove row number 3, and we are also telling it to set the parameter <i>inplace</i> to True, which means that we want to save the returned dataframe in place of the old one. Let is now look at the dataframe:

In [30]:
chicago_crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,3,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,6,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,6,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,6,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,6,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"


We now see that the row which is numbered 3 is no longer part of the dataframe.

We can also use the <b>drop()</b> method to delete more than one row:

In [31]:
chicago_crimes.drop([4, 5])

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
6,12796316,JF358737,08/16/2022 11:57:00 AM,012XX N LA SALLE DR,0460,BATTERY,SIMPLE,APARTMENT,False,False,...,2.0,8,08B,1174910.0,1908582.0,2022,01/03/2023 03:46:28 PM,41.904538,-87.632942,"(41.904538325, -87.632942313)"
7,12795474,JF358356,08/15/2022 10:00:00 PM,061XX S PARK SHORE EAST CT,0810,THEFT,OVER $500,STREET,False,False,...,5.0,42,06,1187557.0,1864569.0,2022,01/03/2023 03:46:28 PM,41.783472,-87.587890,"(41.783471704, -87.587890396)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,12936285,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,...,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


Notice that we now use the square brackets []. As you recall, when we want to pass more than one value, then we need to combine them into an array. We can see from the output that the rows numbered 4 and 5 are no longer there (in addition to the row numbered 3 which we had previously deleted). Remember, the data frame <i>chicago_crimes</i> still contains these rows since we had not saved the drop action by using the <i>inplace</i> parameter.

We can also delete columns using the <b>drop()</b> function:

In [32]:
chicago_crimes.drop(columns="ID")

Unnamed: 0,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,2013,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,825,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,634,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,1232,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,421,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,1935,...,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,1921,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,1911,...,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,1135,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


Notice that we instructued the function to drop columns and not rows but using the <i>columns</i> parameter. We can also drop more than one column. As you should know by now, to do that we need to pass the names as an array:

In [33]:
chicago_crimes.drop(columns=["Date", "Block"])

Unnamed: 0,ID,Case Number,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,2013,20,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,0810,THEFT,OVER $500,STREET,False,False,825,8,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,0810,THEFT,OVER $500,STREET,False,True,634,6,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,1232,12,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,0810,THEFT,OVER $500,RESIDENCE,False,False,421,4,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,12936285,JF526139,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,1935,19,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,12936301,JF526810,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,1921,19,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,1911,19,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,0810,THEFT,OVER $500,STREET,False,False,1135,11,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


The above commands have not been saved. Remember, we were just viewing the result after the drop, but we have not been saving these actions. To make sure, display the first five rows of the dataframe:

In [34]:
chicago_crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,3,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,6,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,6,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,6,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,6,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"


The columns <i>ID</i>, <i>Date</i>, and <i>Block</i> are still there. If we wanted to drop these columns and to overwrite the dataframe, then we need to use the <i>inplace</i> parameter:

In [35]:
chicago_crimes.drop(columns=["ID","Date", "Block"], inplace=True)
chicago_crimes.head()

Unnamed: 0,Case Number,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,JF350580,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,2013,20,48.0,77,3,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,JF352712,810,THEFT,OVER $500,STREET,False,False,825,8,16.0,66,6,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,JF352659,810,THEFT,OVER $500,STREET,False,True,634,6,9.0,49,6,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,JF359058,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,1232,12,11.0,28,6,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"
5,JF359932,810,THEFT,OVER $500,RESIDENCE,False,False,421,4,7.0,43,6,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"


At this point I would like to reload the original data set. We were only deleting rows and columns just as an illustration. How do we do that? We can simply just re-download the data set using the <b>read_csv()</b> method that is part of the <i>pandas</i> package:

In [36]:
chicago_crimes = pd.read_csv("https://data.cityofchicago.org/api/views/9hwr-2zxp/rows.csv?accessType=DOWNLOAD&bom=true&format=true")
chicago_crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"


# Using Conditions to Perform Operations

## One Condition
Notice that so far, with very few lines of code, we are able to download data sets, look at the column types, check the total number of rows in the data set, check the number of missing values for each column, look at specific rows and columns, and drop certain rows and columns. Each of the above can be accomplished using a single line of code. We have not had to write long lines of codes. This is the beauty of libraries, particularly the <i>pandas</i> library.

While what we have done so far is useful and important, it definately is not enough. Yes, it is useful to look at a subset of the data, but do we really need to look at the 4th row? The 20th row? A more useful thing to do would be to look at rows that satisfy a certain condition. This is perhaps one of the most common tasks that a data analyst does.

As an example, let us consider the type of of crime. We know that in our data set there is a column called <i>Primary Type</i>:

In [37]:
chicago_crimes["Primary Type"]

0                    ROBBERY
1                      THEFT
2                      THEFT
3                    ASSAULT
4                      THEFT
                 ...        
238312    DECEPTIVE PRACTICE
238313       CRIMINAL DAMAGE
238314              BURGLARY
238315                 THEFT
238316                 THEFT
Name: Primary Type, Length: 238317, dtype: object

We see that the types of crimes vary. What if we just wanted to look at the records that were of type THEFT? This is possible using the following command:

In [38]:
chicago_crimes[chicago_crimes["Primary Type"] == "THEFT"]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
7,12795474,JF358356,08/15/2022 10:00:00 PM,061XX S PARK SHORE EAST CT,0810,THEFT,OVER $500,STREET,False,False,...,5.0,42,06,1187557.0,1864569.0,2022,01/03/2023 03:46:28 PM,41.783472,-87.587890,"(41.783471704, -87.587890396)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238307,12936322,JF526096,12/24/2022 01:30:00 PM,001XX W RANDOLPH ST,0860,THEFT,RETAIL THEFT,DRUG STORE,False,False,...,42.0,32,06,1175414.0,1901275.0,2022,01/03/2023 03:46:28 PM,41.884476,-87.631311,"(41.884476226, -87.631310613)"
238310,12935276,JF524804,12/21/2022 11:05:00 PM,042XX N MARINE DR,0810,THEFT,OVER $500,STREET,False,False,...,46.0,3,06,1171054.0,1928302.0,2022,01/03/2023 03:46:28 PM,41.958736,-87.646526,"(41.958736385, -87.646526104)"
238311,12937237,JF527290,12/23/2022 08:41:00 AM,064XX W Irving Park Rd,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,False,False,...,38.0,17,06,1132626.0,1925898.0,2022,01/03/2023 03:46:28 PM,41.952898,-87.787861,"(41.952897791, -87.787860507)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


There are several things to explain about the above command. First the double "=" sign. In programming languages, we use a double "=" sign to check if two things are equal. So to check if 2 is equal to 2 we type:

In [39]:
2==2

True

Therefore, when we type ```chicago_crimes["Primary Type"] == "THEFT"``` we are basically saying that we want only the rows that satisfy this condition:

In [40]:
chicago_crimes["Primary Type"] == "THEFT"

0         False
1          True
2          True
3         False
4          True
          ...  
238312    False
238313    False
238314    False
238315     True
238316     True
Name: Primary Type, Length: 238317, dtype: bool

Notice the output. We have a <i>True</i> or a <i>False</i> for each row. True is for rows that have a value of "THEFT" in the column Primary Type, and False is for rows that have some other values. To see this, let us look at the first five rows:

In [41]:
chicago_crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"


See the primary type values? We have ROBBERY, THEFT, THEFT, ASSULT, THEFT, which results in FALSE, TRUE, TRUE, FALSE, TRUE.

We then use this to pick the subset of our dataframe. When the value is <i>True</i>, then the corresponding row will be part of the subset. When the value is <i>False</i> then the row will not be part of the subset:

In [42]:
chicago_crimes[chicago_crimes["Primary Type"]=="THEFT"]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
7,12795474,JF358356,08/15/2022 10:00:00 PM,061XX S PARK SHORE EAST CT,0810,THEFT,OVER $500,STREET,False,False,...,5.0,42,06,1187557.0,1864569.0,2022,01/03/2023 03:46:28 PM,41.783472,-87.587890,"(41.783471704, -87.587890396)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238307,12936322,JF526096,12/24/2022 01:30:00 PM,001XX W RANDOLPH ST,0860,THEFT,RETAIL THEFT,DRUG STORE,False,False,...,42.0,32,06,1175414.0,1901275.0,2022,01/03/2023 03:46:28 PM,41.884476,-87.631311,"(41.884476226, -87.631310613)"
238310,12935276,JF524804,12/21/2022 11:05:00 PM,042XX N MARINE DR,0810,THEFT,OVER $500,STREET,False,False,...,46.0,3,06,1171054.0,1928302.0,2022,01/03/2023 03:46:28 PM,41.958736,-87.646526,"(41.958736385, -87.646526104)"
238311,12937237,JF527290,12/23/2022 08:41:00 AM,064XX W Irving Park Rd,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,False,False,...,38.0,17,06,1132626.0,1925898.0,2022,01/03/2023 03:46:28 PM,41.952898,-87.787861,"(41.952897791, -87.787860507)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


We can save this result in a new dataframe that contains only theft crimes. Remember, a subset of a dataframe is a dataframe:

In [43]:
chicago_crimes_thefts = chicago_crimes[chicago_crimes["Primary Type"]=="THEFT"]
chicago_crimes_thefts.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,810,THEFT,OVER $500,STREET,False,False,...,16.0,66,6,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,810,THEFT,OVER $500,STREET,False,True,...,9.0,49,6,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,6,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.86025,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,6,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
7,12795474,JF358356,08/15/2022 10:00:00 PM,061XX S PARK SHORE EAST CT,810,THEFT,OVER $500,STREET,False,False,...,5.0,42,6,1187557.0,1864569.0,2022,01/03/2023 03:46:28 PM,41.783472,-87.58789,"(41.783471704, -87.587890396)"


Let us look at another example. What if I wanted a dataframe that contained only those crimes that resulted in an Arrest. To do that, we might type the following:

In [44]:
chicago_crimes[chicago_crimes["Arrest"]=="True"]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location


This seems strange. Is it possible that the data set does not contain any crime that resulted in an arrest? Actually, we have done a simple but serious mistake. In the condition above, we are checking to see if the value of <i>Arrest</i> is equal to the text "True", but if you remember well, this column is of type <i>bool</i> (which means it is either traue or false). Let us see this:

In [45]:
chicago_crimes.dtypes

ID                        int64
Case Number              object
Date                     object
Block                    object
IUCR                     object
Primary Type             object
Description              object
Location Description     object
Arrest                     bool
Domestic                   bool
Beat                      int64
District                  int64
Ward                    float64
Community Area            int64
FBI Code                 object
X Coordinate            float64
Y Coordinate            float64
Year                      int64
Updated On               object
Latitude                float64
Longitude               float64
Location                 object
dtype: object

Since we are just interested in the <i>Arrest</i> column, we could have typed the following:

In [46]:
chicago_crimes["Arrest"].dtypes

dtype('bool')

In either case, notice that the type is <i>bool</i>. This is not an object. This is why the condition should have been written the following way:

In [47]:
chicago_crimes[chicago_crimes["Arrest"]==True]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
37,12796098,JF359198,08/16/2022 05:20:00 PM,033XX W ROOSEVELT RD,0560,ASSAULT,SIMPLE,RESTAURANT,True,False,...,24.0,29,08A,1154087.0,1894508.0,2022,01/03/2023 03:46:28 PM,41.866359,-87.709807,"(41.866358918, -87.709806763)"
61,12801503,JF365550,08/22/2022 05:50:00 AM,086XX S KARLOV AVE,0910,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,True,False,...,18.0,70,07,1150583.0,1847019.0,2022,01/03/2023 03:46:28 PM,41.736111,-87.723906,"(41.736111204, -87.723906429)"
67,12802877,JF367040,08/23/2022 08:45:00 AM,056XX N MOZART ST,0560,ASSAULT,SIMPLE,SCHOOL - PUBLIC GROUNDS,True,False,...,40.0,2,08A,1156333.0,1937407.0,2022,01/03/2023 03:46:28 PM,41.984032,-87.700399,"(41.984032004, -87.700398992)"
72,12812318,JF378344,09/01/2022 12:25:00 AM,006XX N MICHIGAN AVE,2022,NARCOTICS,POSSESS - COCAINE,STREET,True,False,...,42.0,8,18,1177309.0,1904938.0,2022,01/03/2023 03:46:28 PM,41.894485,-87.624241,"(41.894484934, -87.62424085)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
237927,12931560,JF521115,12/23/2022 07:06:00 PM,001XX E SUPERIOR ST,1812,NARCOTICS,POSSESS - CANNABIS MORE THAN 30 GRAMS,STREET,True,False,...,42.0,8,18,1177164.0,1905392.0,2022,01/03/2023 03:46:28 PM,41.895734,-87.624760,"(41.89573402, -87.624759616)"
237936,12925588,JF513886,12/17/2022 11:22:00 AM,039XX W FILLMORE ST,2026,NARCOTICS,POSSESS - PCP,VEHICLE NON-COMMERCIAL,True,False,...,24.0,29,18,1150188.0,1895080.0,2022,01/03/2023 03:46:28 PM,41.868005,-87.724106,"(41.868005372, -87.724105561)"
237967,12925155,JF513324,12/16/2022 08:12:00 PM,045XX S WESTERN BLVD,0860,THEFT,RETAIL THEFT,APPLIANCE STORE,True,False,...,12.0,61,06,1161289.0,1874276.0,2022,01/03/2023 03:46:28 PM,41.810694,-87.683929,"(41.810693516, -87.683929054)"
237999,12938989,JG100842,12/04/2022 07:00:00 PM,011XX N CLARK ST,2090,NARCOTICS,ALTER / FORGE PRESCRIPTION,DRUG STORE,True,False,...,2.0,8,18,1175324.0,1908199.0,2022,01/03/2023 03:46:28 PM,41.903478,-87.631433,"(41.903478069, -87.631433102)"


Notice the difference in the commands? The difference is that the word <i>True</i> is not in quotation marks. When the column is of type <i>object</i>, then we use quotation marks. When the column is of type <i>bool</i> then we do not use the quotation marks.

Let us try another example:

In [48]:
chicago_crimes[chicago_crimes["Location Description"]=="STREET"]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
7,12795474,JF358356,08/15/2022 10:00:00 PM,061XX S PARK SHORE EAST CT,0810,THEFT,OVER $500,STREET,False,False,...,5.0,42,06,1187557.0,1864569.0,2022,01/03/2023 03:46:28 PM,41.783472,-87.587890,"(41.783471704, -87.587890396)"
8,12799341,JF363039,08/19/2022 11:00:00 AM,010XX E 46TH ST,0910,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,...,4.0,39,07,1183968.0,1874723.0,2022,01/03/2023 03:46:28 PM,41.811420,-87.600731,"(41.811419728, -87.600731459)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238295,12935727,JF526034,12/25/2022 03:00:00 PM,061XX N WASHTENAW AVE,0810,THEFT,OVER $500,STREET,False,False,...,40.0,2,06,1157219.0,1940724.0,2022,01/03/2023 03:46:28 PM,41.993116,-87.697050,"(41.993115987, -87.697049774)"
238305,12935459,JF525819,12/22/2022 12:01:00 AM,044XX N MAPLEWOOD AVE,0910,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,...,47.0,4,07,1158578.0,1929302.0,2022,01/03/2023 03:46:28 PM,41.961746,-87.692365,"(41.961745665, -87.692365237)"
238310,12935276,JF524804,12/21/2022 11:05:00 PM,042XX N MARINE DR,0810,THEFT,OVER $500,STREET,False,False,...,46.0,3,06,1171054.0,1928302.0,2022,01/03/2023 03:46:28 PM,41.958736,-87.646526,"(41.958736385, -87.646526104)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"


Here, we are getting the rows where the value of <i>Location Description</i> is STREET. Notice that we used quotation marks. This is because this column contains text, i.e. it is of type <i>object</i>. What if we wanted to get the records where the value of <i>Domestic</i> was <i>False</i>? Is the <i>Domestic</i> column of type <i>bool</i> or <i>object</i>? Let's check:

In [49]:
chicago_crimes["Domestic"].dtype

dtype('bool')

It is of type <i>bool</i>. We therefore use the following command:

In [50]:
chicago_crimes[chicago_crimes["Domestic"]==False]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,12936285,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,...,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


As a final example, let us look at crimes that took place in <i>Community Area</i> 29:

In [52]:
chicago_crimes[chicago_crimes["Community Area"]==29]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
22,12788778,JF350628,08/09/2022 09:00:00 AM,033XX W 13TH ST,0890,THEFT,FROM BUILDING,LIBRARY,False,False,...,24.0,29,06,1154098.0,1893840.0,2022,01/03/2023 03:46:28 PM,41.864526,-87.709784,"(41.864525633, -87.709784193)"
35,12796489,JF359613,08/17/2022 02:00:00 AM,039XX W OGDEN AVE,0460,BATTERY,SIMPLE,GAS STATION,False,False,...,22.0,29,08B,1150286.0,1889027.0,2022,01/03/2023 03:46:28 PM,41.851393,-87.723904,"(41.851393305, -87.723903515)"
37,12796098,JF359198,08/16/2022 05:20:00 PM,033XX W ROOSEVELT RD,0560,ASSAULT,SIMPLE,RESTAURANT,True,False,...,24.0,29,08A,1154087.0,1894508.0,2022,01/03/2023 03:46:28 PM,41.866359,-87.709807,"(41.866358918, -87.709806763)"
86,12804684,JF369234,08/24/2022 06:30:00 PM,035XX W CERMAK RD,0870,THEFT,POCKET-PICKING,SIDEWALK,False,False,...,24.0,29,06,1152918.0,1889147.0,2022,01/03/2023 03:46:28 PM,41.851671,-87.714240,"(41.851670934, -87.714240208)"
144,12818261,JF385387,08/10/2022 09:35:00 PM,013XX S KOMENSKY AVE,0880,THEFT,PURSE-SNATCHING,ALLEY,False,False,...,24.0,29,06,1149595.0,1893535.0,2022,01/03/2023 03:46:28 PM,41.863777,-87.726323,"(41.863777237, -87.7263227)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
237936,12925588,JF513886,12/17/2022 11:22:00 AM,039XX W FILLMORE ST,2026,NARCOTICS,POSSESS - PCP,VEHICLE NON-COMMERCIAL,True,False,...,24.0,29,18,1150188.0,1895080.0,2022,01/03/2023 03:46:28 PM,41.868005,-87.724106,"(41.868005372, -87.724105561)"
237976,12924648,JF512358,12/16/2022 06:14:00 AM,013XX S INDEPENDENCE BLVD,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE - PORCH / HALLWAY,False,False,...,24.0,29,14,1151464.0,1893655.0,2022,01/03/2023 03:46:28 PM,41.864070,-87.719458,"(41.864070063, -87.719458499)"
238110,12935262,JF524783,12/23/2022 06:00:00 AM,014XX S SPAULDING AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,24.0,29,11,1154598.0,1892865.0,2022,01/03/2023 03:46:28 PM,41.861840,-87.707975,"(41.861840145, -87.707974761)"
238188,12936468,JF526997,11/05/2022 03:30:00 PM,012XX S AVERS AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,APARTMENT,False,False,...,24.0,29,11,1150909.0,1894078.0,2022,01/03/2023 03:46:28 PM,41.865242,-87.721485,"(41.865241697, -87.721484832)"


Why did we not put the number 29 in quotation marks? You guessed it. The column is not of type object. What type is it?

In [51]:
chicago_crimes["Community Area"].dtypes

dtype('int64')

It is an integer. Numbers, just like bool values do not go in a quotation.

## Two Conditions
How do we get data that satisfies two conditions? Here, we need to introduce the <b>OR</b> and <b>AND</b> operators.

### The AND Operator
In <i>pandas</i> dataframes, when we want to say "and" we use the symbols "&". So, if we want to check if two is equal to two and if two is greater than zero, we type:

In [53]:
(2==2) & (2>0)

True

Let us now use this syntax to get a subset of the crimes data where the crime was of type THEFT and an arrest was made:

In [54]:
chicago_crimes[(chicago_crimes["Primary Type"]=="THEFT") & (chicago_crimes["Arrest"]==True)]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
140,12817625,JF384566,09/05/2022 11:27:00 PM,023XX W 21ST ST,0810,THEFT,OVER $500,CONVENIENCE STORE,True,False,...,25.0,31,06,1161190.0,1890017.0,2022,01/03/2023 03:46:28 PM,41.853891,-87.683856,"(41.853890639, -87.683855655)"
461,12848814,JF421820,10/04/2022 03:42:00 PM,031XX N CLARK ST,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,True,False,...,44.0,6,06,1170283.0,1920956.0,2022,01/03/2023 03:46:28 PM,41.938596,-87.649576,"(41.93859561, -87.649576033)"
654,12867355,JF444181,10/22/2022 06:50:00 AM,035XX N BROADWAY,0860,THEFT,RETAIL THEFT,GROCERY FOOD STORE,True,False,...,46.0,6,06,1171076.0,1923738.0,2022,01/03/2023 03:46:28 PM,41.946212,-87.646580,"(41.946212132, -87.646579678)"
849,12891196,JF472493,10/25/2022 12:00:00 AM,035XX S PULASKI RD,0810,THEFT,OVER $500,WAREHOUSE,True,False,...,22.0,30,06,1150261.0,1881205.0,2022,01/03/2023 03:46:28 PM,41.829929,-87.724199,"(41.829929212, -87.724198877)"
857,12886022,JF466544,11/08/2022 02:56:00 AM,017XX N LINDER AVE,0810,THEFT,OVER $500,STREET,True,False,...,37.0,25,06,1139463.0,1911028.0,2022,01/03/2023 03:46:28 PM,41.911971,-87.763090,"(41.911970795, -87.76309048)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
237663,12931318,JF520805,12/23/2022 11:50:00 AM,031XX W 103RD ST,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,True,False,...,19.0,74,06,1157026.0,1836163.0,2022,01/03/2023 03:46:28 PM,41.706193,-87.700594,"(41.706192826, -87.700593626)"
237674,12928904,JF517643,12/20/2022 04:30:00 PM,0000X E GRAND AVE,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,True,False,...,42.0,8,06,1176780.0,1903916.0,2022,01/03/2023 03:46:28 PM,41.891692,-87.626215,"(41.891692496, -87.626214622)"
237846,12933440,JF523488,12/26/2022 08:37:00 PM,045XX S WESTERN AVE,0860,THEFT,RETAIL THEFT,APPLIANCE STORE,True,False,...,15.0,61,06,1161131.0,1874302.0,2022,01/03/2023 03:46:28 PM,41.810768,-87.684508,"(41.810768138, -87.684507864)"
237868,12931059,JF520508,12/23/2022 03:40:00 AM,020XX N MILWAUKEE AVE,0860,THEFT,RETAIL THEFT,DEPARTMENT STORE,True,False,...,1.0,22,06,1159682.0,1913243.0,2022,01/03/2023 03:46:28 PM,41.917656,-87.688750,"(41.917656022, -87.688750258)"


An important point to note here is that the two conditions must be placed in separate parantheses. 

### The OR Operator
In python, the <b>OR</b> operator is written using the symbol "|":

In [55]:
(2==2) | (2 < 0)

True

Notice that the expression evaluated to True even though 2 is not les sthan 0. This is because we want the first or second condition to be true. When one of the conditions is true, then the result will be true.

Going to our dataframe, let us get the crimes that resulted in an arrest or that were domestic:

In [56]:
chicago_crimes[(chicago_crimes["Arrest"]==True) | (chicago_crimes["Domestic"]==True)]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
9,12793378,JF355848,08/13/2022 10:45:00 PM,118XX S SANGAMON ST,0560,ASSAULT,SIMPLE,STREET,False,True,...,34.0,53,08A,1172121.0,1826361.0,2022,01/03/2023 03:46:28 PM,41.678977,-87.645603,"(41.678976937, -87.645603032)"
10,12798495,JF361883,08/18/2022 06:00:00 PM,030XX E 80TH ST,2820,OTHER OFFENSE,TELEPHONE THREAT,RESIDENCE,False,True,...,7.0,46,08A,1197715.0,1852504.0,2022,01/03/2023 03:46:28 PM,41.750117,-87.551051,"(41.75011688, -87.551050773)"
13,12790997,JF353231,08/11/2022 06:00:00 PM,073XX S MAY ST,051A,ASSAULT,AGGRAVATED - HANDGUN,RESIDENCE,False,True,...,17.0,68,04A,1169922.0,1856159.0,2022,01/03/2023 03:46:28 PM,41.760795,-87.652790,"(41.76079497, -87.65279005)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238262,12938984,JG100896,12/25/2022 01:00:00 PM,075XX S EGGLESTON AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,True,...,6.0,69,08A,1174599.0,1854938.0,2022,01/03/2023 03:46:28 PM,41.757342,-87.635685,"(41.757341587, -87.63568495)"
238263,12936415,JF518127,12/21/2022 01:15:00 AM,010XX W 14TH ST,0497,BATTERY,AGGRAVATED DOMESTIC BATTERY - OTHER DANGEROUS ...,APARTMENT,False,True,...,25.0,28,04B,1169755.0,1893571.0,2022,01/03/2023 03:46:28 PM,41.863461,-87.652316,"(41.863461031, -87.652315509)"
238288,12937066,JF522595,12/25/2022 08:06:00 PM,0000X W 109TH ST,1310,CRIMINAL DAMAGE,TO PROPERTY,STREET,False,True,...,34.0,49,14,1177923.0,1832690.0,2022,01/03/2023 03:46:28 PM,41.696216,-87.624175,"(41.696215599, -87.624174764)"
238308,12937577,JF515781,12/19/2022 06:00:00 AM,039XX W MONROE ST,0498,BATTERY,"AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SE...",APARTMENT,False,True,...,28.0,26,04B,1150117.0,1899370.0,2022,01/03/2023 03:46:28 PM,41.879779,-87.724254,"(41.879779001, -87.724254458)"


Looking at the result, we see that in each of these records, either <i>Arrest</i> is <i>True</i> or <i>Domestic</i> is <i>True</i>.

Let us now get the crimes that were commited in <i>Community Area</i> 23 or that were of type BATTERY:

In [None]:
chicago_crimes[(chicago_crimes["Community Area"]==23) | (chicago_crimes["Primary Type"]=="BATTERY")]

Examining these records, we see that all of them are either of type BATTERY, or they have been commited in <i>Community Area</i> 23. 

### Less Than or Equal to and Greater than or Equal to
Let us get the crimes that were committed in a <i>Community Area</i> with a number that is at least 23, and the <i>Ward</i> number is less than 30:

In [57]:
chicago_crimes[(chicago_crimes["Community Area"] >= 23) & (chicago_crimes["Ward"] < 30)]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238300,12935244,JF505474,12/10/2022 01:00:00 PM,017XX W 86TH ST,1365,CRIMINAL TRESPASS,TO RESIDENCE,APARTMENT,False,False,...,21.0,71,26,1166366.0,1847669.0,2022,01/03/2023 03:46:28 PM,41.737574,-87.666064,"(41.737573664, -87.666064301)"
238301,12935201,JF523453,12/26/2022 02:00:00 PM,013XX W 76TH ST,0610,BURGLARY,FORCIBLE ENTRY,APARTMENT,False,False,...,17.0,71,05,1168669.0,1854374.0,2022,01/03/2023 03:46:28 PM,41.755924,-87.657434,"(41.755923803, -87.65743376)"
238306,12936699,JF524860,12/26/2022 12:00:00 PM,017XX W 79TH ST,0890,THEFT,FROM BUILDING,COMMERCIAL / BUSINESS OFFICE,False,False,...,17.0,71,06,1166290.0,1852314.0,2022,01/03/2023 03:46:28 PM,41.750322,-87.666211,"(41.750321833, -87.666210811)"
238308,12937577,JF515781,12/19/2022 06:00:00 AM,039XX W MONROE ST,0498,BATTERY,"AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SE...",APARTMENT,False,True,...,28.0,26,04B,1150117.0,1899370.0,2022,01/03/2023 03:46:28 PM,41.879779,-87.724254,"(41.879779001, -87.724254458)"


The phrase "at least" means greater than or equal, which is written using the operator ">=". The condition "less than" is simply written using the operator "<". If we wanted to say "less than or equal", we would use the operator "<=". As an example, get the crimes where the value of <i>X Coordinate</i> is less than or equal to 1177962:

In [58]:
chicago_crimes[chicago_crimes["X Coordinate"] <= 1177962]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,12936285,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,...,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


### Not Equal to
There is another operator that we have not yet covered, and that is the "not equal to" operator. What if we wanted the crimes which did not take place in Community Area 23? In other words, we want to get all crimes where the value of Community Area is not equal to 23. This can be accomplished using the "!=" symbols, which basically means "not equal to":

In [59]:
chicago_crimes[chicago_crimes["Community Area"]!=23]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238312,12936285,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,43.0,7,11,1170513.0,1917030.0,2022,01/03/2023 03:46:28 PM,41.927817,-87.648846,"(41.927817456, -87.648845932)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,...,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


### The  isin() Method
Assume that wa want the crimes that were either THEFT or BATTERY. The following command will do the job:

In [60]:
chicago_crimes[(chicago_crimes["Primary Type"]=="THEFT") | (chicago_crimes["Primary Type"]=="BATTERY")]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
6,12796316,JF358737,08/16/2022 11:57:00 AM,012XX N LA SALLE DR,0460,BATTERY,SIMPLE,APARTMENT,False,False,...,2.0,8,08B,1174910.0,1908582.0,2022,01/03/2023 03:46:28 PM,41.904538,-87.632942,"(41.904538325, -87.632942313)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238308,12937577,JF515781,12/19/2022 06:00:00 AM,039XX W MONROE ST,0498,BATTERY,"AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SE...",APARTMENT,False,True,...,28.0,26,04B,1150117.0,1899370.0,2022,01/03/2023 03:46:28 PM,41.879779,-87.724254,"(41.879779001, -87.724254458)"
238310,12935276,JF524804,12/21/2022 11:05:00 PM,042XX N MARINE DR,0810,THEFT,OVER $500,STREET,False,False,...,46.0,3,06,1171054.0,1928302.0,2022,01/03/2023 03:46:28 PM,41.958736,-87.646526,"(41.958736385, -87.646526104)"
238311,12937237,JF527290,12/23/2022 08:41:00 AM,064XX W Irving Park Rd,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,False,False,...,38.0,17,06,1132626.0,1925898.0,2022,01/03/2023 03:46:28 PM,41.952898,-87.787861,"(41.952897791, -87.787860507)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


What about getting the crimes that are either THEFT, or BATTERY, or ASSUALT? Again, we can just use the same logic as above:

In [61]:
chicago_crimes[(chicago_crimes["Primary Type"]=="THEFT") | (chicago_crimes["Primary Type"]=="BATTERY") | (chicago_crimes["Primary Type"]=="ASSAULT")]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238308,12937577,JF515781,12/19/2022 06:00:00 AM,039XX W MONROE ST,0498,BATTERY,"AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SE...",APARTMENT,False,True,...,28.0,26,04B,1150117.0,1899370.0,2022,01/03/2023 03:46:28 PM,41.879779,-87.724254,"(41.879779001, -87.724254458)"
238310,12935276,JF524804,12/21/2022 11:05:00 PM,042XX N MARINE DR,0810,THEFT,OVER $500,STREET,False,False,...,46.0,3,06,1171054.0,1928302.0,2022,01/03/2023 03:46:28 PM,41.958736,-87.646526,"(41.958736385, -87.646526104)"
238311,12937237,JF527290,12/23/2022 08:41:00 AM,064XX W Irving Park Rd,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,False,False,...,38.0,17,06,1132626.0,1925898.0,2022,01/03/2023 03:46:28 PM,41.952898,-87.787861,"(41.952897791, -87.787860507)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


We can put as many conditions as we want. However, this is getting tedious. Luckily, the <i>pandas</i> library has a very useful function for this. This function is the <b>isin()</b> function. As the name suggests, it checks whether a value is in a certain set. Let us take a look at it:

In [62]:
chicago_crimes[chicago_crimes["Primary Type"].isin(["THEFT", "BATTERY", "ASSAULT"])]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238308,12937577,JF515781,12/19/2022 06:00:00 AM,039XX W MONROE ST,0498,BATTERY,"AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SE...",APARTMENT,False,True,...,28.0,26,04B,1150117.0,1899370.0,2022,01/03/2023 03:46:28 PM,41.879779,-87.724254,"(41.879779001, -87.724254458)"
238310,12935276,JF524804,12/21/2022 11:05:00 PM,042XX N MARINE DR,0810,THEFT,OVER $500,STREET,False,False,...,46.0,3,06,1171054.0,1928302.0,2022,01/03/2023 03:46:28 PM,41.958736,-87.646526,"(41.958736385, -87.646526104)"
238311,12937237,JF527290,12/23/2022 08:41:00 AM,064XX W Irving Park Rd,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,False,False,...,38.0,17,06,1132626.0,1925898.0,2022,01/03/2023 03:46:28 PM,41.952898,-87.787861,"(41.952897791, -87.787860507)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


Here, we ae just telling <i>pandas</i> to return the records where the value of the column <i>Primary Type</i> is in the given array. Notice that we use the square brackets. This is because, as you remember, we are passing more than one value to the function. We could have performed the above in two steps: 

In [63]:
crime_types = ["THEFT", "BATTERY", "ASSAULT"]
chicago_crimes[chicago_crimes["Primary Type"].isin(crime_types)]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
5,12796817,JF359932,08/15/2022 03:00:00 PM,075XX S PHILLIPS AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,7.0,43,06,1193842.0,1855434.0,2022,01/03/2023 03:46:28 PM,41.758253,-87.565147,"(41.758252785, -87.565147001)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238308,12937577,JF515781,12/19/2022 06:00:00 AM,039XX W MONROE ST,0498,BATTERY,"AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SE...",APARTMENT,False,True,...,28.0,26,04B,1150117.0,1899370.0,2022,01/03/2023 03:46:28 PM,41.879779,-87.724254,"(41.879779001, -87.724254458)"
238310,12935276,JF524804,12/21/2022 11:05:00 PM,042XX N MARINE DR,0810,THEFT,OVER $500,STREET,False,False,...,46.0,3,06,1171054.0,1928302.0,2022,01/03/2023 03:46:28 PM,41.958736,-87.646526,"(41.958736385, -87.646526104)"
238311,12937237,JF527290,12/23/2022 08:41:00 AM,064XX W Irving Park Rd,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,False,False,...,38.0,17,06,1132626.0,1925898.0,2022,01/03/2023 03:46:28 PM,41.952898,-87.787861,"(41.952897791, -87.787860507)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


We can combine the <b>isin()</b> function with other operators. For example, let us get all crimes of types THEFT, BATTERY, and ASSAULT that have been commited in <i>Community Area</i> 29:

In [64]:
chicago_crimes[(chicago_crimes["Primary Type"].isin(["THEFT", "BATTERY", "ASSAULT"])) & (chicago_crimes["Community Area"]==29)]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
22,12788778,JF350628,08/09/2022 09:00:00 AM,033XX W 13TH ST,0890,THEFT,FROM BUILDING,LIBRARY,False,False,...,24.0,29,06,1154098.0,1893840.0,2022,01/03/2023 03:46:28 PM,41.864526,-87.709784,"(41.864525633, -87.709784193)"
35,12796489,JF359613,08/17/2022 02:00:00 AM,039XX W OGDEN AVE,0460,BATTERY,SIMPLE,GAS STATION,False,False,...,22.0,29,08B,1150286.0,1889027.0,2022,01/03/2023 03:46:28 PM,41.851393,-87.723904,"(41.851393305, -87.723903515)"
37,12796098,JF359198,08/16/2022 05:20:00 PM,033XX W ROOSEVELT RD,0560,ASSAULT,SIMPLE,RESTAURANT,True,False,...,24.0,29,08A,1154087.0,1894508.0,2022,01/03/2023 03:46:28 PM,41.866359,-87.709807,"(41.866358918, -87.709806763)"
86,12804684,JF369234,08/24/2022 06:30:00 PM,035XX W CERMAK RD,0870,THEFT,POCKET-PICKING,SIDEWALK,False,False,...,24.0,29,06,1152918.0,1889147.0,2022,01/03/2023 03:46:28 PM,41.851671,-87.714240,"(41.851670934, -87.714240208)"
144,12818261,JF385387,08/10/2022 09:35:00 PM,013XX S KOMENSKY AVE,0880,THEFT,PURSE-SNATCHING,ALLEY,False,False,...,24.0,29,06,1149595.0,1893535.0,2022,01/03/2023 03:46:28 PM,41.863777,-87.726323,"(41.863777237, -87.7263227)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
237679,12934815,JF525187,12/15/2022 12:00:00 AM,027XX W 15TH ST,0820,THEFT,$500 AND UNDER,HOSPITAL BUILDING / GROUNDS,False,False,...,28.0,29,06,1158464.0,1892617.0,2022,01/03/2023 03:46:28 PM,41.861081,-87.693790,"(41.861081457, -87.693790049)"
237734,12925997,JF514296,12/17/2022 06:28:00 PM,036XX W OGDEN AVE,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,24.0,29,04A,1152598.0,1890161.0,2022,01/03/2023 03:46:28 PM,41.854460,-87.715388,"(41.854459796, -87.715387915)"
237877,12923943,JF511793,12/15/2022 03:58:00 PM,011XX S HOMAN AVE,0560,ASSAULT,SIMPLE,SCHOOL - PUBLIC BUILDING,True,False,...,24.0,29,08A,1153879.0,1894754.0,2022,01/03/2023 03:46:28 PM,41.867038,-87.710564,"(41.867038112, -87.710563803)"
237904,12930839,JF520302,12/22/2022 06:59:00 PM,011XX S CENTRAL PARK AVE,0486,BATTERY,DOMESTIC BATTERY SIMPLE,RESIDENCE,False,True,...,24.0,29,08B,1152544.0,1894972.0,2022,01/03/2023 03:46:28 PM,41.867663,-87.715459,"(41.8676628, -87.715459036)"


### The  isna() Method
If you recall, we had previously used the <b>isna()</b> function to get the values that were null. This function can also be used as a condition:

In [65]:
chicago_crimes[chicago_crimes["Location Description"].isna()]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
749,12814348,JF380344,08/27/2022 08:35:00 PM,018XX N MILWAUKEE AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,32.0,22,11,,,2022,09/03/2022 04:48:29 PM,,,
1258,12624817,JF152109,02/21/2022 01:00:00 AM,044XX S BERKELEY AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,4.0,39,11,,,2022,02/28/2022 03:47:25 PM,,,
1305,12941312,JG102678,12/22/2022 10:35:00 AM,032XX W MADISON ST,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,,False,False,...,28.0,27,11,,,2022,01/04/2023 03:49:45 PM,,,
1316,12624795,JF152108,02/21/2022 12:40:00 AM,044XX S BERKELEY AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,4.0,39,11,,,2022,02/28/2022 03:47:25 PM,,,
1373,12624799,JF152283,02/22/2022 05:00:00 AM,048XX N LAVERGNE AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,45.0,11,11,,,2022,03/01/2022 03:51:28 PM,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238182,12939131,JG100330,08/31/2022 01:00:00 PM,017XX W IRVING PARK RD,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,47.0,6,11,1163856.0,1926604.0,2022,01/03/2023 03:46:28 PM,41.954232,-87.673037,"(41.954232251, -87.673036778)"
238196,12936303,JF526006,12/16/2022 08:30:00 AM,0000X N ABERDEEN ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,27.0,28,11,1169140.0,1900400.0,2022,01/03/2023 03:46:28 PM,41.882214,-87.654375,"(41.882213693, -87.654374823)"
238204,12935269,JF524782,12/06/2022 04:20:00 PM,001XX E 44TH ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,3.0,38,11,1178021.0,1875881.0,2022,01/03/2023 03:46:28 PM,41.814734,-87.622509,"(41.814734427, -87.622509357)"
238259,12935258,JF524780,12/12/2022 06:20:00 PM,074XX W PALATINE AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,41.0,10,11,1125839.0,1941361.0,2022,01/03/2023 03:46:28 PM,41.995446,-87.812465,"(41.995445746, -87.812464832)"


Notice that the value for <i>Location Description</i> for all returned rows is null. This is because the function <b>isna()</b> returns a True when the value is 0, and returns a False otherwise. True values satisfy the condition and therefore the rows are shown.

What if we wanted to return the rows where the value was not missing? Once again, <i>pandas</i> has what we need:

In [66]:
chicago_crimes[chicago_crimes["Location Description"].notna()]

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12789052,JF350580,08/09/2022 04:07:00 PM,014XX W ELMDALE AVE,0325,ROBBERY,VEHICULAR HIJACKING,STREET,True,False,...,48.0,77,03,1165640.0,1939961.0,2022,01/03/2023 03:46:28 PM,41.990846,-87.666096,"(41.990846423, -87.666096144)"
1,12790581,JF352712,08/10/2022 04:00:00 PM,062XX S ARTESIAN AVE,0810,THEFT,OVER $500,STREET,False,False,...,16.0,66,06,1161110.0,1863210.0,2022,01/03/2023 03:46:28 PM,41.780331,-87.684892,"(41.780330681, -87.684891779)"
2,12790652,JF352659,08/11/2022 10:00:00 AM,094XX S STATE ST,0810,THEFT,OVER $500,STREET,False,True,...,9.0,49,06,1177962.0,1842197.0,2022,01/03/2023 03:46:28 PM,41.722303,-87.623745,"(41.722303228, -87.623745129)"
3,12796135,JF359082,08/15/2022 09:14:00 PM,048XX S KARLOV AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,False,...,14.0,57,08A,1149844.0,1872244.0,2022,01/03/2023 03:46:28 PM,41.805347,-87.725961,"(41.805347066, -87.725961264)"
4,12795972,JF359058,08/16/2022 04:10:00 PM,015XX S HALSTED ST,0820,THEFT,$500 AND UNDER,SIDEWALK,False,False,...,11.0,28,06,1171290.0,1892413.0,2022,01/03/2023 03:46:28 PM,41.860250,-87.646715,"(41.860249838, -87.64671467)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238311,12937237,JF527290,12/23/2022 08:41:00 AM,064XX W Irving Park Rd,0860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,False,False,...,38.0,17,06,1132626.0,1925898.0,2022,01/03/2023 03:46:28 PM,41.952898,-87.787861,"(41.952897791, -87.787860507)"
238313,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,32.0,5,14,1161968.0,1923233.0,2022,01/03/2023 03:46:28 PM,41.945022,-87.680072,"(41.945021752, -87.680071764)"
238314,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,...,47.0,4,05,1158237.0,1929586.0,2022,01/03/2023 03:46:28 PM,41.962532,-87.693611,"(41.962531969, -87.693611152)"
238315,12935341,JF525383,12/20/2022 06:45:00 AM,027XX W ROOSEVELT RD,0810,THEFT,OVER $500,STREET,False,False,...,28.0,29,06,1158071.0,1894595.0,2022,01/03/2023 03:46:28 PM,41.866517,-87.695179,"(41.866517317, -87.695178701)"


# Conclusion
This is the end of the free sample of the course. If you have made it this far and think that the course is beneficial for you, then please enroll in the full paid version which you can find here: https://www.udemy.com/course/becoming-a-data-analyst-using-python-from-scratch