### Course Description
Begin your journey into Data Science! Even if you've never written a line of code in your life, you'll be able to follow this course and witness the power of Python to perform Data Science. You'll use data to solve the mystery of Bayes, the kidnapped Golden Retriever, and along the way you'll become familiar with basic Python syntax and popular Data Science modules like Matplotlib (for charts and graphs) and Pandas (for tabular data).

# Chapter 1 - Getting Started in Python
Welcome to the wonderful world of Data Analysis in Python! In this chapter, you'll learn the basics of Python syntax, load your first Python modules, and use functions to get a suspect list for the kidnapping of Bayes, DataCamp's prize-winning Golden Retriever.

### Importing Python modules

Modules (sometimes called packages or libraries) help group together related sets of tools in Python. In this exercise, we'll examine two modules that are frequently used by Data Scientists:

1. `statsmodels` : used in machine learning; usually aliased as `sm`
2. `seaborn` : a visualization library; usually aliased as `sns`

Note that each module has a standard alias, which allows you to access the tools inside of the module without typing as many characters. For example, aliasing lets us shorten `seaborn.scatterplot()` to `sns.scatterplot()`.

In [None]:
# Use an import statement to import statsmodels
import statsmodels

# Import statsmodels under the alias sm
import statsmodels as sm

# Use an import statement to import seaborn with alias sns
import seaborn as sns

### Correcting a broken import

![](http://)In this exercise, we'll learn to import `numpy`, a module for performing mathematical operations on lists of data. The standard alias for `numpy` is `np`.

What did you need to change to make the import run without errors?

In [None]:
# Fix the import of numpy to run without errors
import numpy as np

# Question: What did you need to change to make the import run without errors?
# Answer: Python is case-sensitive, so `numpy` must be all lowercase

### Creating a float
Before we start looking for Bayes' kidnapper, we need to fill out a Missing Puppy Report with details of the case. Each piece of information will be stored as a variable.

We define a variable using an equals sign (`=`). For instance, we would define the variable `height` :

`height = 24`

In this exercise, we'll be defining `bayes_age` to be `4.0` months old. The data type for this variable will be float, meaning that it is a number.

In [None]:
# Define a variable called bayes_age and set it equal to 4.0
bayes_age = 4.0

# Display the variable bayes_age
print(bayes_age)

### Creating strings
Let's continue to fill out the Missing Puppy Report for Bayes. In the previous exercise, we defined `bayes_age`, which was a *float*, which represents a number.

In this exercise, we'll define `favorite_toy` and `owner`, which will both be *strings*. A *string* represents text. A *string* is surrounded by quotation marks (`'` or `"`) and can contain letters, numbers, and special characters. It doesn't matter if you use single (`'`) or double (`"`) quotes, but it's important to be consistent throughout your code.

In [None]:
# Bayes' favorite toy
favorite_toy = "Mr. Squeaky"

# Bayes' owner
owner = 'DataCamp'

# Display variables
print(favorite_toy)
print(owner)

### Correcting string errors
It's easy to make errors when you're trying to type strings quickly.

* Don't forget to use quotes! Without quotes, you'll get a name error.

`owner = DataCamp`

* Use the same type of quotation mark. If you start with a single quote, and end with a double quote, you'll get a syntax error.

`fur_color = "blonde'`

Someone at the police station made an error when filling out the final lines of Bayes' Missing Puppy Report. In this exercise, you will correct the errors.

In [None]:
# One or more of the following lines contains an error
# Correct it so that it runs without producing syntax errors
birthday = "2017-07-14"
case_id = "DATACAMP!123-456?"

### Valid variable names
Which of the following is not a valid variable name?

* my_dog_bayes
* BAYES42
* `3dogs`
* this_is_a_very_long_variable_name_42

### Load a DataFrame
A ransom note was left at the scene of Bayes' kidnapping. Eventually, we'll want to analyze the frequency with which each letter occurs in the note, to help us identify the kidnapper. For now, we just need to load the data from `ransom.csv` into Python.

We'll load the data into a DataFrame, a special *data type* from the `pandas` module. It represents spreadsheet-like data (something with rows and columns).

We can create a DataFrame from a CSV (comma-separated value) file by using the function `pd.read_csv`.

In [None]:
# Import pandas
import pandas as pd

# Load the 'ransom.csv' into a DataFrame
r = pd.read_csv('ransom.csv')

# Display DataFrame
print(r)

### Correcting a function error
The code in the script editor should plot information from the DataFrame that we loaded in the previous exercise.

However, there is an error in function syntax. Remember that common function errors include:
* Forgetting closing parenthesis
* Forgetting commas between each argument

Note that all arguments to the functions are correct. The problem is in the function syntax.

In [None]:
# One or more of the following lines contains an error
# Correct it so that it runs without producing syntax errors

# Plot a graph
plt.plot(x_values, y_values)

# Display the graph
plt.show()

### Snooping for suspects
We need to narrow down the list of suspects for the kidnapping of Bayes. Once we have a list of suspects, we'll ask them for writing samples and compare them to the ransom note.

A witness to the crime noticed a green truck leaving the scene of the crime whose license plate began with 'FRQ'. We'll use this information to search for some suspects.

As a detective, you have access to a special function called `lookup_plate`.

`lookup_plate` accepts one positional argument: A string representing a license plate.

Create a variable called `plate` that represents the observed license plate: the first three letters were `FRQ`, but the witness couldn't see the final 4 letters. Use asterisks (`*`) to represent missing letters.

In [None]:
# Define plate to represenent a plate beginning with FRQ
# Use * to represent the missing four letters
plate = "FRQ****"

# Chapter 2 - Loading Data in pandas
In this chapter, you'll learn a powerful Python libary: pandas. Pandas lets you read, modify, and search tabular datasets (like spreadsheets and database tables). You'll examine credit card records for the suspects and see if any of them made suspicious purchases.

### Loading a DataFrame
We're still working hard to solve the kidnapping of Bayes, the Golden Retriever. Previously, we used a license plate spotted at the crime scene to narrow the list of suspects to:

* Fred Frequentist
* Ronald Aylmer Fisher
* Gertrude Cox
* Kirstine Smith

We've obtained credit card records for all four suspects. Perhaps some of them made suspicious purchases before the kidnapping?

The records are in a CSV called `"credit_records.csv"`.

In [None]:
# Import pandas under the alias pd
import pandas as pd

# Load the CSV "credit_records.csv"
credit_records = pd.read_csv('credit_records.csv')

# Display the first five rows of credit_records using the .head() method
print(credit_records.head())

### Inspecting a DataFrame
We've loaded the credit card records of our four suspects into a DataFrame called `credit_records`. Let's learn more about the structure of this DataFrame.

The `pandas` module has been imported under the alias `pd`. The DataFrame `credit_records` has already been imported.

How many rows are in `credit_records`?

In [None]:
#Use .info() to inspect the DataFrame credit_records
print(credit_records.info())

### Two methods for selecting columns
Once again, we've loaded the credit card records of our four suspects into a DataFrame called `credit_records`. Let's examine the items that they've purchased.

The `pandas` module has been imported under the alias `pd`. The DataFrame `credit_records` has already been imported.

In [None]:
# Select the column item from credit_records
# Use brackets and string notation
items = credit_records["item"]

# Display the results
print(items)

# Select the column item from credit_records
# Use dot notation
items = credit_records.item

# Display the results
print(items)

### Correcting column selection errors
A junior detective tried to access the location columns of `credit_records`, but he made some mistakes. Help correct his code so that we can search for suspicious purchases.

In all exercises going forward, `pandas` will be imported as `pd`. The DataFrame `credit_records` has already been imported.

In [None]:
# One or more lines of code contain errors.
# Fix the errors so that the code runs.

# Select the location column in credit_records
location = credit_records['location']

# Select the item column in credit_records
items = credit_records.item

# Display results
print(location)

### More column selection mistakes
Another junior detective is examining a DataFrame of Missing Puppy Reports. He's made some mistakes that cause the code to fail.

The `pandas` module has been loaded under the alias `pd`, and the DataFrame is called `mpr`.

In [None]:
# Use info() to inspect mpr
print(mpr.info())

# The following code contains one or more errors
# Correct the mistakes in the code so that it runs without errors

# Select column "Dog Name" from mpr
name = mpr['Dog Name']

# Select column "Missing?" from mpr
is_missing = mpr['Missing?']

# Display the columns
print(name)
print(is_missing)

# Question: Why did this code generate an error?
# name = mpr.Dog Name
# Answer: If a column name contains a space, then it needs to be in brackets and string notation.

### Logical testing
Let's practice writing logical statements and displaying the output.

Recall that we use the following operators:

* `==` tests that two values are equal.
* `!=` tests that two values are not equal.
* `>` and `<` test that greater than or less than, respectively.
* `>=` and `<=` test greater than or equal to or less than or equal to, respectively.

In [None]:
# Is height_inches greater than 70 inches?
print(height_inches > 70)

# Is plate1 equal to "FRQ123"?
print(plate1 == "FRQ123")

# Is fur_color not equal to "brown"?
print(fur_color != "brown")

### Selecting missing puppies
Let's return to our DataFrame of missing puppies, which is loaded as `mpr`. Let's select a few different rows to learn more about the other missing dogs.

In [None]:
# Select the dogs where Age is greater than 2
greater_than_2 = mpr[mpr.Age > 2]
print(greater_than_2)

# Select the dogs whose Status is equal to Still Missing
still_missing = mpr[mpr.Status == 'Still Missing']
print(still_missing)

# Select all dogs whose Dog Breed is not equal to Poodle
not_poodle = mpr[mpr['Dog Breed'] != 'Poodle']
print(not_poodle)

### Narrowing the list of suspects
In Chapter 1, we found a list of people whose cars matched the description of the one that kidnapped Bayes:

* Fred Frequentist
* Ronald Aylmer Fisher
* Gertrude Cox
* Kirstine Smith

We'd like to narrow this list down, so we obtained credit card records for each suspect. We'd like to know if any of them recently purchased dog treats to use in the kidnapping. If they did, they would have visited `'Pet Paradise'`.

The credit records have been loaded into a DataFrame called `credit_records`.

In [None]:
# Select purchases from 'Pet Paradise'
purchase = credit_records[credit_records.location == 'Pet Paradise']

# Display
print(purchase)

# Question: Which suspects purchased pet supplies before the kidnapping?
# Answer: Fred Frequentist and Gertrude Cox

# Chapter 3 - Plotting Data with matplotlib
Get ready to visualize your data! You'll create line plots with another Python module: matplotlib. Using line plots, you'll analyze the letter frequencies from the ransom note and several handwriting samples to determine the kidnapper.

### Working hard
Several police officers have been working hard to help us solve the mystery of Bayes, the kidnapped Golden Retriever. Their commanding officer wants to know exactly how hard each officer has been working on this case. Officer Deshaun has created DataFrames called `deshaun` to track the amount of time he spent working on this case. The DataFrame contains two columns:

* `day_of_week` : a string representing the day of the week
* `hours_worked` : the number of hours that a particular officer worked on the Bayes case'

In [None]:
# From matplotlib, import pyplot under the alias plt
from matplotlib import pyplot as plt

# Plot Officer Deshaun's hours_worked vs. day_of_week
plt.plot(deshaun.day_of_week, deshaun.hours_worked)

# Display Deshaun's plot
plt.show()

### Or hardly working?
Two other officers have been working with Deshaun to help find Bayes. Their names are Officer Mengfei and Officer Aditya. Deshaun used their time cards to create two more DataFrames: `mengfei` and `aditya`. In this exercise, we'll plot all three lines together to see who was working hard each day.

We've already loaded `matplotlib` under the alias `plt`.

In [None]:
# Plot Officer Deshaun's hours_worked vs. day_of_week
plt.plot(deshaun.day_of_week, deshaun.hours_worked)

# Plot Officer Aditya's hours_worked vs. day_of_week
plt.plot(aditya.day_of_week, aditya.hours_worked)

# Plot Officer Mengfei's hours_worked vs. day_of_week
plt.plot(mengfei.day_of_week, mengfei.hours_worked)

# Display all three line plots
plt.show()

# Question: One of the officers was removed from the investigation on Wednesday because of an emergency at a different station house. That office did not return on Thursday or Friday. Which color line represents that officer?
# Answer: Orange

### Adding a legend
Officers Deshaun, Mengfei, and Aditya have all been working with you to solve the kidnapping of Bayes. Their supervisor wants to know how much time each officer has spent working on the case.

Deshaun created a plot of data from the DataFrames `deshaun`, `mengfei`, and `aditya` in the previous exercise. Now he wants to add a legend to distinguish the three lines.

In [None]:
# 1) Using the keyword label, label Deshaun's plot as "Deshaun".
# Add a label to Deshaun's plot
plt.plot(deshaun.day_of_week, deshaun.hours_worked, label = "Deshaun")

# Officer Aditya
plt.plot(aditya.day_of_week, aditya.hours_worked)

# Officer Mengfei
plt.plot(mengfei.day_of_week, mengfei.hours_worked)

# Display plot
plt.show()

# 2) Add labels to Mengfei's ("Mengfei") and Aditya's ("Aditya") plots.
# Officer Deshaun
plt.plot(deshaun.day_of_week, deshaun.hours_worked, label='Deshaun')

# Add a label to Aditya's plot
plt.plot(aditya.day_of_week, aditya.hours_worked, label = 'Aditya')

# Add a label to Mengfei's plot
plt.plot(mengfei.day_of_week, mengfei.hours_worked, label = 'Mengfei')

# Display plot
plt.show()

# 3) Nothing is displaying yet! Add a command to make the legend display.
# Officer Deshaun
plt.plot(deshaun.day_of_week, deshaun.hours_worked, label='Deshaun')

# Add a label to Aditya's plot
plt.plot(aditya.day_of_week, aditya.hours_worked, label='Aditya')

# Add a label to Mengfei's plot
plt.plot(mengfei.day_of_week, mengfei.hours_worked, label='Mengfei')

# Add a command to make the legend display
plt.legend()

# Display plot
plt.show()

# Question: One of the officers did not start working on the case until Wednesday. Which officer?
# Answer: Mengfei

### Adding labels
If we give a chart with no labels to Officer Deshaun's supervisor, she won't know what the lines represent.

We need to add labels to Officer Deshaun's plot of hours worked.

In [None]:
# Lines
plt.plot(deshaun.day_of_week, deshaun.hours_worked, label='Deshaun')
plt.plot(aditya.day_of_week, aditya.hours_worked, label='Aditya')
plt.plot(mengfei.day_of_week, mengfei.hours_worked, label='Mengfei')

# Add a title
plt.title("Officer's hours worked")

# Add y-axis label
plt.ylabel('Hours worked')

# Legend
plt.legend()
# Display plot
plt.show()

### Adding floating text
Officer Deshaun is examining the number of hours that he worked over the past six months. The number for June is low because he only had data for the first week. Help Deshaun add an annotation to the graph to explain this.

In [None]:
# Place the annotation "Missing June data" at the point (2.5, 80)
# Create plot
plt.plot(six_months.month, six_months.hours_worked)

# Add annotation "Missing June data" at (2.5, 80)
plt.text(2.5, 80, 'Missing June data')

# Display graph
plt.show()

### Tracking crime statistics
Sergeant Laura wants to do some background research to help her better understand the cultural context for Bayes' kidnapping. She has plotted Burglary rates in three U.S. cities using data from the [Uniform Crime Reporting Statistics](https://www.ucrdatatool.gov/Search/Crime/Local/LocalCrimeLarge.cfm).

She wants to present this data to her officers, and she wants the image to be as beautiful as possible to effectively tell her data story.

Recall:

* You can change `linestyle` to dotted (`':'`), dashed(`'--'`), or no line (`''`).
* You can change the `marker` to circle (`'o'`), diamond(`'d'`), or square (`'s'`).

In [None]:
# Change the color of Phoenix to `"DarkCyan"`
plt.plot(data["Year"], data["Phoenix Police Dept"], label="Phoenix", color='DarkCyan')

# Make the Los Angeles line dotted
plt.plot(data["Year"], data["Los Angeles Police Dept"], label="Los Angeles", linestyle=':')

# Add square markers to Philedelphia
plt.plot(data["Year"], data["Philadelphia Police Dept"], label="Philadelphia", marker='s')

# Add a legend
plt.legend()

# Display the plot
plt.show()

### Playing with styles
Help Sergeant Laura wants to try out a few different style options. Changing the plotting style is a fast way to change the entire look of your plot without having to update individual colors or line styles. Some popular styles include:

   * `'fivethirtyeight'` - Based on the color scheme of the popular website
   * `'grayscale'` - Great for when you don't have a color printer!
   * `'seaborn'` - Based on another Python visualization library
   * `'classic'` - The default color scheme for Matplotlib

In [None]:
# Change the style to fivethirtyeight
plt.style.use('fivethirtyeight')

# Plot lines
plt.plot(data["Year"], data["Phoenix Police Dept"], label="Phoenix")
plt.plot(data["Year"], data["Los Angeles Police Dept"], label="Los Angeles")
plt.plot(data["Year"], data["Philadelphia Police Dept"], label="Philadelphia")

# Add a legend
plt.legend()

# Display the plot
plt.show()

In [None]:
# Change the style to ggplot
plt.style.use('ggplot')

# Plot lines
plt.plot(data["Year"], data["Phoenix Police Dept"], label="Phoenix")
plt.plot(data["Year"], data["Los Angeles Police Dept"], label="Los Angeles")
plt.plot(data["Year"], data["Philadelphia Police Dept"], label="Philadelphia")

# Add a legend
plt.legend()

# Display the plot
plt.show()

In [None]:
# View all styles by typing print(plt.style.available) in the console
# Choose any of the styles
plt.style.use('grayscale')

# Plot lines
plt.plot(data["Year"], data["Phoenix Police Dept"], label="Phoenix")
plt.plot(data["Year"], data["Los Angeles Police Dept"], label="Los Angeles")
plt.plot(data["Year"], data["Philadelphia Police Dept"], label="Philadelphia")

# Add a legend
plt.legend()

# Display the plot
plt.show()

### Identifying Bayes' kidnapper
We've narrowed the possible kidnappers down to two suspects:

* Fred Frequentist (`suspect1`)
* Gertrude Cox (`suspect2`)

The kidnapper left a long ransom note containing several unusual phrases. Help DataCamp by using a line plot to compare the frequency of letters in the ransom note to samples from the two main suspects.

Three DataFrames have been loaded:

* `ransom` contains the letter frequencies for the ransom note.
* `suspect1` contains the letter frequencies for the sample from Fred Frequentist.
* `suspect2` contains the letter frequencies for the sample from Gertrude Cox.

Each DataFrame contain two columns `letter` and `frequency`.

In [None]:
# Plot the letter frequencies from the ransom note. The x-values should be ransom.letter. The y-values should be ransom.frequency. The label should be the string 'Ransom'. The line should be dotted and gray.
# x should be ransom.letter and y should be ransom.frequency
plt.plot(ransom.letter, ransom.frequency,
         # Label should be "Ransom"
         label="Ransom",
         # Plot the ransom letter as a dotted gray line
         linestyle=':', color='gray')

# Display the plot
plt.show()

In [None]:
# Plot a line for the data in suspect1. Use a keyword argument to label that line 'Fred Frequentist').
# Plot each line
plt.plot(ransom.letter, ransom.frequency,
         label='Ransom', linestyle=':', color='gray')

# X-values should be suspect1.letter
# Y-values should be suspect1.frequency
# Label should be "Fred Frequentist"
plt.plot(suspect1.letter, suspect1.frequency, label='Fred Frequentist')

# Display the plot
plt.show()

In [None]:
# Plot a line for the data in suspect2 (labeled 'Gertrude Cox').
# Plot each line
plt.plot(ransom.letter, ransom.frequency,
         label='Ransom', linestyle=':', color='gray')
plt.plot(suspect1.letter, suspect1.frequency,
         label='Fred Frequentist')

# X-values should be suspect2.letter
# Y-values should be suspect2.frequency
# Label should be "Gertrude Cox"
plt.plot(suspect2.letter, suspect2.frequency, label='Gertrude Cox')

# Display plot
plt.show()

In [None]:
# Label the x-axis (Letter) and the y-axis (Frequency), and add a legend.
# Plot each line
plt.plot(ransom.letter, ransom.frequency,
         label='Ransom', linestyle=':', color='gray')
plt.plot(suspect1.letter, suspect1.frequency, label='Fred Frequentist')
plt.plot(suspect2.letter, suspect2.frequency, label='Gertrude Cox')

# Add x- and y-labels
plt.xlabel("Letter")
plt.ylabel("Frequency")

# Add a legend
plt.legend()

# Display plot
plt.show()

# Chapter 4 - Different Types of Plots
In this final chapter, you'll learn how to create three new plot types: scatter plots, bar plots, and histograms. You'll use these tools to locate where the kidnapper is hiding and rescue Bayes, the Golden Retriever.

### Charting cellphone data
We know that Freddy Frequentist is the one who kidnapped Bayes the Golden Retriever. Now we need to learn where he is hiding.

Our friends at the police station have acquired cell phone data, which gives some of Freddie's locations over the past three weeks. It's stored in the DataFrame `cellphone`. The x-coordinates are in the column `'x'` and the y-coordinates are in the column `'y'`.

The `matplotlib` module has been imported under the alias `plt`.

In [None]:
# Explore the data
print(cellphone.head())

# Create a scatter plot of the data from the DataFrame cellphone
plt.scatter(cellphone.x, cellphone.y)

# Add labels
plt.ylabel('Longitude')
plt.xlabel('Latitude')

# Display the plot
plt.show()

### Modifying a scatterplot
In the previous exercise, we created a scatter plot to show Freddy Frequentist's cell phone data.

In this exercise, we've done some magic so that the plot will appear over a map of our town. If we just plot the data as we did before, we won't be able to see the map or pick out the areas with the most points. We can fix this by changing the colors, markers, and transparency of the scatter plot.

As before, the `matplotlib` module has been imported under the alias `plt`, and the cellphone data is in the DataFrame `cellphone`.

In [None]:
# Change the marker color to red
plt.scatter(cellphone.x, cellphone.y,
           color='red')

# Add labels
plt.ylabel('Longitude')
plt.xlabel('Latitude')

# Display the plot
plt.show()

In [None]:
# Change the marker shape to square
plt.scatter(cellphone.x, cellphone.y,
           color='red',
           marker='s')

# Add labels
plt.ylabel('Longitude')
plt.xlabel('Latitude')

# Display the plot
plt.show()

In [None]:
# Change the transparency to 0.1
plt.scatter(cellphone.x, cellphone.y,
           color='red',
           marker='s',
           alpha=0.1)

# Add labels
plt.ylabel('Longitude')
plt.xlabel('Latitude')

# Display the plot
plt.show()

### Build a simple bar chart
Officer Deshaun wants to plot the average number of hours worked per week for him and his coworkers. He has stored the hours worked in a DataFrame called `hours`, which has columns `officer` and `avg_hours_worked`. Recall that the function `plt.bar()` takes two arguments: the labels for each bar, and the height of each bar. Both of these can be found in our DataFrame.

In [None]:
# Create a bar chart of the column avg_hours_worked for each officer from the DataFrame hours.
# Display the DataFrame hours using print
print(hours)

# Create a bar plot from the DataFrame hours
plt.bar(hours.officer, hours.avg_hours_worked)

# Display the plot
plt.show()

# Use the column std_hours_worked (the standard deviation of the hours worked) to add error bars to the bar chart.
# Display the DataFrame hours using print
print(hours)

# Create a bar plot from the DataFrame hours
plt.bar(hours.officer, hours.avg_hours_worked,
        # Add error bars
        yerr=hours.std_hours_worked)

# Display the plot
plt.show()

### Where did the time go?
Officer Deshaun wants to compare the hours spent on field work and desk work between him and his colleagues. In this DataFrame, he has split out the average hours worked per week into `desk_work` and `field_work`.

You can use the same DataFrame containing the hours worked from the previous exercise (`hours`).

In [None]:
# Create a bar plot for field_work whose bottom is the height of desk_work. Label the field_work bars as "Field Work" and add a legend.
# Plot the number of hours spent on desk work
plt.bar(hours.officer, hours.desk_work, label='Desk Work')

# Plot the hours spent on field work on top of desk work
plt.bar(hours.officer, hours.field_work, bottom=hours.desk_work, label='Field Work')

# Add a legend
plt.legend()

# Display the plot
plt.show()

### Modifying histograms
Let's explore how changes to keyword parameters in a histogram can change the output. Recall that:

* `range` sets the minimum and maximum datapoints that we will include in our histogram.
* `bins` sets the number of points in our histogram.

We'll be exploring the weights of various puppies from the DataFrame `puppies`. `matplotlib` has been loaded under the alias `plt`.

In [None]:
# Create a histogram of the column weight from the DataFrame puppy_weight
plt.hist(puppy_weight.weight)

# Add labels
plt.xlabel('Puppy Weight (lbs)')
plt.ylabel('Number of Puppies')

# Display
plt.show()

In [None]:
# Change the number of bins to 50
plt.hist(puppy_weight.weight,
        bins=50)

# Add labels
plt.xlabel('Puppy Weight (lbs)')
plt.ylabel('Number of Puppies')

# Display
plt.show()

In [None]:
# Change the range to start at 5 and end at 35
plt.hist(puppy_weight.weight,
        range=(5, 35))

# Add labels
plt.xlabel('Puppy Weight (lbs)')
plt.ylabel('Number of Puppies')

# Display
plt.show()

### Heroes with histograms
We've identified that the kidnapper is Fred Frequentist. Now we need to know where Fred is hiding Bayes.

A shoe print at the crime scene contains a specific type of gravel. Based on the distribution of gravel radii, we can determine where the kidnapper recently visited. It might be:

![blue-meadows-park](https://assets.datacamp.com/production/repositories/3582/datasets/18d918fc771de7adde6d7b5cc5ad8d63394aad3d/blue_meadows_park.jpg) 
![shady-groves-campsite](https://assets.datacamp.com/production/repositories/3582/datasets/96303e15fecb4a3c4d125265528b280724ef3fc3/shady_groves_campsite.jpg) 
![happy-mountain-trailhead](https://assets.datacamp.com/production/repositories/3582/datasets/9448ff116980a637fe1e8f2d9a9305ff219e9d18/happy_mountain_trailhead.jpg)
The radii of individual gravel pieces has been loaded into the DataFrame `gravel`, and `matplotlib` has been loaded under the alias `plt`.

In [None]:
# Create a histogram of gravel.radius
plt.hist(gravel.radius)

# Display histogram
plt.show()


# Modify the histogram such that the histogram is divided into 40 bins and the range is from 2 to 8.
# Create a histogram
# Range is 2 to 8, with 40 bins
plt.hist(gravel.radius, range=(2,8), bins=40)

# Display histogram
plt.show()


# Normalize your histogram so that the sum of the bins adds to 1.
# Create a histogram
# Normalize to 1
plt.hist(gravel.radius,
         bins=40,
         range=(2, 8),
         density=True)

# Display histogram
plt.show()


# Label the x-axis (Gravel Radius (mm)), the y-axis (Frequency), and the title(Sample from Shoeprint).
# Create a histogram
plt.hist(gravel.radius,
         bins=40,
         range=(2, 8),
         density=True)

# Label plot
plt.xlabel('Gravel Radius (mm)')
plt.ylabel('Frequency')
plt.title('Sample from Shoeprint')

# Display histogram
plt.show()