# CHAPTER TEN

## Introduction to Data Analysis

We’ll learn about the Pandas library and how to work with tabular data structures, web scraping with BeautifulSoup and understanding how to parse data, as well as data visualization libraries like matplotlib. At the end of the week, we’ll use all these libraries together to create a small project that scrapes and analyzes web sites.

###### Overview

• Working with Anaconda environments and sending requests

• Learning how to analyze tabular data structures with Pandas

• Understanding how to present data using matplotlib

• Using the BeautifulSoup library to scrape the Web for data

• Creating a web site analysis tool

###### Challenge Question

Imagine you’re a data analyst and you’ve just been handed a set of data that shows the number of accidents for all drivers, their ages, and the size of their engines. You need to figure out a way to display this information so that it tells a story. Normally you would create a graph with x, y, z coordinates; however, that can become complicated, and you don’t have time for
that. How would you render the information so that it’s still considered 3-dimensional, but you can only use the x and y axis?

### Monday: Virtual Environments and Requests Module

###### Virtual Enviroments

Python virtual environments are essentially a tool that allows you to keep project dependencies in a separate space from other projects. Most projects in Python need to use modules that are not included by default with Python. Now, you could simply
download the modules (or libraries) into your Python folder to use; however, that could cause some issues down the road. Let’s say you’re working on two separate projects, where the first one uses Python version 2.7 and the second project uses Python version 3.5. If you try and use the same syntax for both, you’ll run into several issues. Instead, you would create two separate virtual environments, one for each project.

Note When creating a virtual environment, a folder called “venv ” will appear. This is where all the libraries that you download are saved. Simply put, a virtual environment is not much more than a folder that stores other files.

Note R emember from the first chapter, when working in terminal, you’ll see the $ next to the commands that we enter. For the next few sections, we’ll be working inside of terminal.

##### Pip
Pip is a standard package manager in Python. It is used in downloading, installing, unstalling or managing many Python modules and libraries that are not inbuilt.It has been included in all installations of Python since v3.4. To check your version of pip, write the following in terminal:
    $ pip --version
    
Feel free to visit the Python Package Index (PyPI) to view all the possible libraries that you’re able to download.


###### Creating a Virtual Environment
Use the following command


    $ conda create --name data_analysis python=3.7

In [1]:
a =0
b =0
def multiply(a, b):
    a * b
    
multiply(4,6)

###### APIs and the Requests Module

The requests module allows us to make HTTP requests using Python. It is the standard library for making API calls and requesting information from outside resources.

###### Note

If you’re unfamiliar with HTTP requests, I suggest checking out the w3schools1 resource for more information, as this book is not designed to cover networking.
An application programming interface (API) is a set of functions and procedures that allow applications to access the features or data of an operating system, application, or other service. In a simpler description, APIs allow us to interact with web pages and software designed by other developers.

#### Virtual Environments and Requests Module.

###### Sending a request

For this lesson, we’ll be requesting information from an API created by Github. Generally, APIs require a key in order to use their service; however, we’ll be using one that doesn’t require an API key. To begin, we must send a request to a specific URL,
which will send a response back to us. That response will include data that we’ll be able to parse through. Write the following:

In [1]:
#  Sending a request and logging the response
import requests
r = requests.get("https://api.github.com/users/connor-SM")
print(r)
print(type(r))

<Response [200]>
<class 'requests.models.Response'>


###### Accessing the Response Content 
In order to access the data that we get back in the response, we need to access the
content attribute within our requests object:

In [2]:
## Accessing the content that we requested from the URL
data = r.content
print(data)

b'{"login":"Connor-SM","id":20958711,"node_id":"MDQ6VXNlcjIwOTU4NzEx","avatar_url":"https://avatars.githubusercontent.com/u/20958711?v=4","gravatar_id":"","url":"https://api.github.com/users/Connor-SM","html_url":"https://github.com/Connor-SM","followers_url":"https://api.github.com/users/Connor-SM/followers","following_url":"https://api.github.com/users/Connor-SM/following{/other_user}","gists_url":"https://api.github.com/users/Connor-SM/gists{/gist_id}","starred_url":"https://api.github.com/users/Connor-SM/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/Connor-SM/subscriptions","organizations_url":"https://api.github.com/users/Connor-SM/orgs","repos_url":"https://api.github.com/users/Connor-SM/repos","events_url":"https://api.github.com/users/Connor-SM/events{/privacy}","received_events_url":"https://api.github.com/users/Connor-SM/received_events","type":"User","site_admin":false,"name":"Connor Milliken","company":"HubSpot, Inc.","blog":"www.connormilliken.c

##### Converting the Response
Luckily for us, the requests object comes with a built-in JSON conversion method calledjson(). After we convert the response to a dictionary, let’s output all the key-value pairs:

In [3]:
# Converting data from JSON into a python dictionary and outputting all key-value pairs
data = r.json() # Converting the data from a string to a dictionary
for k, v in data.items():
    print("Key: {} \t Value: {}".format(k,v))
    print(data["name"]) # Accessing data directly

Key: login 	 Value: Connor-SM
Connor Milliken
Key: id 	 Value: 20958711
Connor Milliken
Key: node_id 	 Value: MDQ6VXNlcjIwOTU4NzEx
Connor Milliken
Key: avatar_url 	 Value: https://avatars.githubusercontent.com/u/20958711?v=4
Connor Milliken
Key: gravatar_id 	 Value: 
Connor Milliken
Key: url 	 Value: https://api.github.com/users/Connor-SM
Connor Milliken
Key: html_url 	 Value: https://github.com/Connor-SM
Connor Milliken
Key: followers_url 	 Value: https://api.github.com/users/Connor-SM/followers
Connor Milliken
Key: following_url 	 Value: https://api.github.com/users/Connor-SM/following{/other_user}
Connor Milliken
Key: gists_url 	 Value: https://api.github.com/users/Connor-SM/gists{/gist_id}
Connor Milliken
Key: starred_url 	 Value: https://api.github.com/users/Connor-SM/starred{/owner}{/repo}
Connor Milliken
Key: subscriptions_url 	 Value: https://api.github.com/users/Connor-SM/subscriptions
Connor Milliken
Key: organizations_url 	 Value: https://api.github.com/users/Connor-SM/orgs


###### Passing Parameters
Most API calls that you perform will require extra information like parameters or headers. This information is taken in by the API and used to perform a specific task. Let’s perform a call this time while passing parameters in the URL to search for Python-specific repositories on Github:

In [4]:
# outputting specific key-value pairs from data
r = requests.get("https://api.github.com/search/repositories?q=language:python")
data = r.json()
print(data["total_count"]) # output the total number of repositories that use python

8414792


###### Monday Exercises

1. Test Environment: Create a new virtual environment called “test.” When creating it, install Python version 2.7 instead of the current version. After it’s completed, make sure it installed the proper version of Python by checking the list.

2. JavaScript Repositories: Using the requests module and the Github API link in our last lesson, figure out how many repositories on Github use JavaScript.

##### Solutions
###### QN 1.
(base) C:\Users\Lenovo\Desktop\python_bootcamp>conda create --name test python=2.7
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.11.0
  latest version: 4.12.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: C:\Users\Lenovo\miniconda3\envs\test

  added / updated specs:
    - python=2.7


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.6.20          |     pyhd3eb1b0_3         155 KB
    pip-19.3.1                 |           py27_0         1.7 MB
    python-2.7.18              |       hfb89ab9_0        15.5 MB
    setuptools-44.0.0          |           py27_0         528 KB
    sqlite-3.30.1              |       h0c8e037_0         588 KB
    vc-9                       |       h2eaa2aa_6           5 KB
    vs2008_runtime-9.00.30729.1|       haa95532_6         501 KB
    wincertstore-0.2           |   py27hf04cefb_0          14 KB
    ------------------------------------------------------------
                                           Total:        18.9 MB

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/win-64::ca-certificates-2022.4.26-haa95532_0
  certifi            pkgs/main/noarch::certifi-2020.6.20-pyhd3eb1b0_3
  pip                pkgs/main/win-64::pip-19.3.1-py27_0
  python             pkgs/main/win-64::python-2.7.18-hfb89ab9_0
  setuptools         pkgs/main/win-64::setuptools-44.0.0-py27_0
  sqlite             pkgs/main/win-64::sqlite-3.30.1-h0c8e037_0
  vc                 pkgs/main/win-64::vc-9-h2eaa2aa_6
  vs2008_runtime     pkgs/main/win-64::vs2008_runtime-9.00.30729.1-haa95532_6
  wheel              pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0
  wincertstore       pkgs/main/win-64::wincertstore-0.2-py27hf04cefb_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
wincertstore-0.2     | 14 KB     | ############################################################################ | 100%
certifi-2020.6.20    | 155 KB    | ############################################################################ | 100%
sqlite-3.30.1        | 588 KB    | ############################################################################ | 100%
vs2008_runtime-9.00. | 501 KB    | ############################################################################ | 100%
pip-19.3.1           | 1.7 MB    | ############################################################################ | 100%
vc-9                 | 5 KB      | ############################################################################ | 100%
setuptools-44.0.0    | 528 KB    | ############################################################################ | 100%
python-2.7.18        | 15.5 MB   | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate test
#
# To deactivate an active environment, use
#
#     $ conda deactivate


(base) C:\Users\Lenovo\Desktop\python_bootcamp>conda activate test

(test) C:\Users\Lenovo\Desktop\python_bootcamp>python
Python 2.7.18 |Anaconda, Inc.| (default, Apr 23 2020, 17:26:54) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import sys
>>> os.path.dirname(sys.executable)
'C:\\Users\\Lenovo\\miniconda3\\envs\\test'
>>> quit()

(test) C:\Users\Lenovo\Desktop\python_bootcamp>

.

In [7]:
##### QN 2.

# outputting specific key-value pairs from data
p = requests.get("https://api.github.com/search/repositories?q=language:javascript")
data = p.json()
print(data["total_count"]) # output the total number of repositories that use JavaScript

16513796


### Tuesday: Pandas

Like how yesterday’s lesson began, we need to install the Pandas library into our virtual environment. To follow along with today’s lesson, cd into the “python_bootcamp” folder, and activate the environment. We’ll begin today within the terminal.

##### What is Pandas?

Pandas is a flexible data analysis library built within the C language, which is excellent for working with tabular data. It is currently the de facto standard for Python-based data analysis, and fluency in Pandas will do wonders for your productivity and frankly your resume. It is one of the fastest ways of getting from zero to answer. Having been written in C, it has increased speed when performing calculations. The Pandas module is a high performance, highly efficient, and high-level data analysis library. It allows us to work with large sets of data called DataFrames.

###### Note
NumPy is a fundamental package for scientific computing in Python. Built from the C language, it uses multidimensional arrays and can perform calculations at high-rate speeds.

##### Pandas is useful in the folloeing ways:

• Calculate statistics and answer questions about the data like average, median, max, and min of each column

• Finding correlations between columns

• Tracking the distribution of one or more columns

• Visualizing the data with the help of matplotlib, using plot bars, histograms, etc.

• Cleaning and filtering data, whether it’s missing or incomplete, just by applying a user-defined function (UDF) or built-in function

• Transforming tabular data into Python to work with

• Exporting the data into a CSV, other file, or database

• Feature engineer new columns that can be applied to your analysis

###### Key Terms

• Series ➤ One-dimensional labeled array capable of holding data of any type

• DataFrame ➤ Spreadsheet

• Axis ➤ Column or row, axis = 0 by row; axis = 1 by column

• Record ➤ A single row

• dtype ➤ Data type for DataFrame or series object

• Time Series ➤ Series object that uses time intervals, like tracking weather by the hour

###### Installing Pandas
$ pip install pandas

#### Pandas

##### Importing Pandas
To follow along with the rest of this lesson, let’s open and continue from our previous notebook file “Week_10” and simply add a markdown cell at the bottom that says, “Pandas.” Importing Pandas is simple; however, there is an industry standard when you import the library:



In [9]:
# Importing the pandas Librayry
import pandas as pd #industry standard name of pd when iporting

In [11]:
## Creating a DataFrame
# using the from_dict method to convert a dictionary into Pandas DataFrame
import random

random.seed(3) #generate same random numbers every time number is used doesn't matter
names = ["Jess", "Tim", "Terry", "Mark", "Tyler", "Rebecca"]
ages = [random.randint(18, 35) for x in range (len(names))]

people = {"names": names, "ages": ages}

df = pd.DataFrame.from_dict(people)
print(pd)

<module 'pandas' from 'C:\\Users\\Lenovo\\anaconda3\\lib\\site-packages\\pandas\\__init__.py'>


Go ahead and run the cell. We import the random module so that we may create random ages for our people on line 7. Using the seed method on line 4 will give us both the same random numbers to work with. You could pass any number as the argument into seed; however, if you use a number other than 3, you’ll get a different output than this book.
Note R andom numbers aren’t truly random; they follow a specific algorithm to return a number. After we generate a list of names and random ages for each person, we create a
dictionary called “people.” The magic truly happens on line 11, where we use Pandas to create the DataFrame that we’ll be working with. When it’s created, it uses the keys as
the column names, and the values match up with the corresponding index, such that names[0] and ages[0] will be a single record.

##### Accessing Data
There are a few different ways that we can access the data within a DataFrame. You havethe option to choose by the column or by the record. Let’s look at how to do both.


###### Indexing by a column
Accessing data by a column is the same as accessing data from a dictionary with the key. Within the first set of brackets, you put the column name that you would like to access.

In [12]:
# directly selecting a column in Pandas
print(df["ages"])
print(df["ages"[3]]) # Selct the value of "ages" in the fourth row(0- index based)

# print(df[4]) doesn't work, 4 is not a column name

0    25
1    35
2    22
3    29
4    33
5    20
Name: ages, dtype: int64


KeyError: 's'

In [13]:
### Indexing by Record
# directly selecting a record in Pandas using .loc
print(df.loc[0])
print(df.loc[0]["names"]) # selecting the value at record 0 in the "names" column

names    Jess
ages       25
Name: 0, dtype: object
Jess


In [15]:
#### Slicing a DataFrame
# slicing a dataframe to grab specific records
print(df[2:5]) #This will output the records at index 2, 3, and 4.

   names  ages
2  Terry    22
3   Mark    29
4  Tyler    33


##### Built in Methods
###### head()
To see the top records in the DataFrame, along with the column names, you use the head() method:

In [16]:
## Accessing the top 5 records using .head()
df.head(5)

Unnamed: 0,names,ages
0,Jess,25
1,Tim,35
2,Terry,22
3,Mark,29
4,Tyler,33


###### tail()
Used to view a given number of records from the bottom

In [17]:
# accessing the botton 3 records using tail
df.tail(3)

Unnamed: 0,names,ages
3,Mark,29
4,Tyler,33
5,Rebecca,20


###### keys()
Sometimes you’ll need the column names. Whether you’re making a modular script or analyzing the data you’re working with, using the keys( ) method will help:

In [19]:
# Accessing the column headers (keys) using the .keys() method
headers = df.keys()
print(headers)

Index(['names', 'ages'], dtype='object')


###### .shape

The shape of a DataFrame describes the number of records by the number of columns. It’s always important to check the shape to ensure you’re working with the proper amount of data:

In [20]:
# Checking the shape, which is the number of records
print(df.shape)

(6, 2)


###### describe()
The describe method will give you a base analysis for all numerical data. You’ll be able to view min, max, 25%, 50%, mean, etc., on all columns just by calling this method on the DataFrame. This information is helpful to start your analysis but generally won’t answer those questions you’re looking for. Instead, we can use this method as a guideline of where to start:

In [21]:
# checking the general statistics of the DataFrame using .describe(), only works on numerical columns
df.describe()

Unnamed: 0,ages
count,6.0
mean,27.333333
std,6.022181
min,20.0
25%,22.75
50%,27.0
75%,32.0
max,35.0


###### sort_values()
When you need to sort a DataFrame based on column information, you use this method. You can pass in one or multiple columns to be sorted by. When passing multiple, you must pass them in as a list of column names, in which the first name will take precedence:

In [23]:
# sort based on a given column, but keep the DataFrame in tact using sort_values()
df.sort_values("ages")
df.head(5)

Unnamed: 0,names,ages
0,Jess,25
1,Tim,35
2,Terry,22
3,Mark,29
4,Tyler,33


#### Filtration
Filter DataFrames for information that meets a specific condition

###### Conditionals
Rather than filtering out information, we can create a boolean data type column that represents the condition we’re checking. Let’s take our current DataFrame and write a
condition that shows those who are 21 or older and can drink:

In [24]:
# Using a conditional to create a true/false column to work with 
can_drink = df["ages"] > 21
print(can_drink)

0     True
1     True
2     True
3     True
4     True
5    False
Name: ages, dtype: bool


#### Subsetting

Used in filtering out records but retain the information within the DataFrame.
We will use subsetting to filter out records rather than create a true-false representation:
    

In [26]:
# using subsetting to filter out records and keep DataFrame intact

df[df["ages"]>21]

#The output results in only those records whose agesa are equal to or above the age of 21.

Unnamed: 0,names,ages
0,Jess,25
1,Tim,35
2,Terry,22
3,Mark,29
4,Tyler,33


#### Column Transformations

Rarely, if ever, will the columns in the original raw DataFrame imported from CSV or a database be the ones you need for your analysis. You will spend lots of time constantly transforming columns or groups of columns using general computational operations to produce new ones that are functions of the old ones. Pandas has full support for this and does it efficiently.

##### Generating a new column with data
To create a new column within a DataFrame, you use the same syntax as if you were adding a new key-value pair into a dictionary. Let’s create a column of fake data that
represents how long the people within our DataFrame have been customers with our company:

In [27]:
# Generating a new column of fake data for each record in the DataFrame to represent customer tenure
random.seed(321)
tenure = [random.randint(0, 10) for x in range(len(df))]
df["tenure"] = tenure # same as adding a new key-value pair in dictionary
df.head()

Unnamed: 0,names,ages,tenure
0,Jess,25,4
1,Tim,35,6
2,Terry,22,2
3,Mark,29,5
4,Tyler,33,8


#### apply()
Adding new columns based on current data is known as “feature engineering.” It makes up a good portion of a data analysts’ job. Often, you won’t be able to answer the
questions you have from the data you collect. Instead, you need to create your own data that is useful to answering questions. For this example, let’s try to answer the following question: “What age group does each customer belong to?”. You could look at the persons’ age and assume their age group; however, we want to make it easier than that. In order to answer this question easily, we’ll need to feature engineer a new column that represents each customer’s age group. We can do this by using the apply method on the DataFrame. The apply method takes in each record, applies the function passed, and sets the value
returned as the new column data. Let’s check it out:

In [28]:
# Feature engineering a new column from known data using a UDF
def ageGroup(age):
    return "Teenager" if age<21 else "Adult"
df["age_group"] = df["ages"].apply(ageGroup)
df.head(10)

Unnamed: 0,names,ages,tenure,age_group
0,Jess,25,4,Adult
1,Tim,35,6,Adult
2,Terry,22,2,Adult
3,Mark,29,5,Adult
4,Tyler,33,8,Adult
5,Rebecca,20,7,Teenager


#### Aggregations 
The raw data plus transformations is generally only half the story. Your objective is to extract actual insights and actionable conclusions from the data, and that means
reducing it from potentially billions of rows to a summary of statistics via aggregation functions. This section assumes some knowledge of SQL and the groupby function.
If you’re not familiar with how groupby works in SQL, visit w3schools3 for reference material.

www.w3schools.com/sql/sql_groupby.asp

###### groupby()
In order to condense the information down to a summary of statistics, we’ll need to use the groupby method that Pandas has. Whenever you group information together, you need to use an aggregate function to let the program know how to group the information together. For now, let’s count how many records of each age group there are within our DataFrame:

In [29]:
# grouping the records together to count how many records in each group
df.groupby("age_group", as_index=False).count().head()

Unnamed: 0,age_group,names,ages,tenure
0,Adult,5,5,5
1,Teenager,1,1,1


##### mean()
Instead of counting how many records there are in each category, let’s go ahead and find the averages of each column by using the mean method. We’ll group based on the same column:

In [31]:
# grouping the data to see all averages  of all columns
df.groupby("age_group", as_index = False).mean().head()

Unnamed: 0,age_group,ages,tenure
0,Adult,28.8,5.0
1,Teenager,20.0,7.0


##### groupby() with multiple columns
When you need to group by multiple columns, the arguments must be passed in as a list. The first item in the list will be the main column that the DataFrame is grouped by. In our case, let’s check how many adults have a tenure of five years:

In [33]:
# grouping information in their age group, then by their tenure
df.groupby(["age_group", "tenure"], as_index=False).count().head(10)

Unnamed: 0,age_group,tenure,names,ages
0,Adult,2,1,1
1,Adult,4,1,1
2,Adult,5,1,1
3,Adult,6,1,1
4,Adult,8,1,1
5,Teenager,7,1,1


##### Adding a Record
To add a record into the DataFrame, you’ll need to access the next index and assign a value asa list structure. In our case, the next index would be 7. Let’s add an identical row that already exists in our DataFrame, so we can see how to remove duplicate information in the next cell:

In [34]:
# adding a record to the bottom of the DataFrame
df.loc[7] = [25, "Jess", 2, "Adult"] # add a record
df.head(10)

Unnamed: 0,names,ages,tenure,age_group
0,Jess,25,4,Adult
1,Tim,35,6,Adult
2,Terry,22,2,Adult
3,Mark,29,5,Adult
4,Tyler,33,8,Adult
5,Rebecca,20,7,Teenager
7,25,Jess,2,Adult


##### drop_duplicates()
It’s imperative that you remove all duplicate records as it will skew your data, resulting in incorrect answers. You can remove duplicate records based on a single column or an entire record being identical. In our case, let’s remove duplicates based on similar names, which will remove the record we just added into our DataFrame:

In [35]:
# removing duplicates based on the same names
df = df.drop_duplicates(subset="names")
df.head()

Unnamed: 0,names,ages,tenure,age_group
0,Jess,25,4,Adult
1,Tim,35,6,Adult
2,Terry,22,2,Adult
3,Mark,29,5,Adult
4,Tyler,33,8,Adult


Note: Omitting the subset argument will remove only duplicate records that have identical values in all columns.

#### Pandas Joins

Often, you will have to combine data from several different sources to obtain the actual dataset you need for your exploration or modeling. Pandas draws heavily on SQL in its
design for joins. This section assumes some knowledge of SQL and SQL joins. If you’re not familiar with how joins work in SQL, visit w3schools4 for reference material.

www.w3schools.com/sql/sql_join.asp

###### Createing a second DataFrame

Let’s create a secondary DataFrame to represent our customers posting ratings about our company. We’ll create ratings for three users so we can see both inner joins and
outer joins:


In [36]:
# creating another fake DataFrame to work with, having same names and new ratings column
ratings = {
    "names": ["Jess", "Tyler", "Ted"],
    "ratings": [10, 9, 6]
}
ratings = df.from_dict(ratings)
ratings.head()

Unnamed: 0,names,ratings
0,Jess,10
1,Tyler,9
2,Ted,6


###### Inner Join

Anytime you perform a join, you need a unique column to join the data with. In our case, we can use the names column to join the ratings DataFrame with our original DataFrame. Let’s perform an inner join on these two datasets so that we can connect users with their ratings:

In [37]:
# Performing an inner join with df and ratings DataFrames based on names, get data that matches
matched_ratings = df.merge(ratings, on="names", how="inner")
matched_ratings.head()

Unnamed: 0,names,ages,tenure,age_group,ratings
0,Jess,25,4,Adult,10
1,Tyler,33,8,Adult,9


###### Outer Join
If we want to return all the records, but connect the ratings for people who gave one, we would need to perform an outer join. This would allow us to keep all records from
our original DataFrame while adding the ratings column. We need to specify the how parameter to “outer”:

In [39]:
# performing an outer join with our df and ratings DataFrames based on names, get all data
all_ratings = df.merge(ratings, on="names", how="outer")
all_ratings.head()

Unnamed: 0,names,ages,tenure,age_group,ratings
0,Jess,25,4.0,Adult,10.0
1,Tim,35,6.0,Adult,
2,Terry,22,2.0,Adult,
3,Mark,29,5.0,Adult,
4,Tyler,33,8.0,Adult,9.0


## Thank you God for bringing me this end. Give me understanding and intelligence to do it on my own.