# Data Acquisition

---

This notebook outlines the process for acquiring the Telco customer data needed for this project. This notebook also demonstrates the steps to create and test the functions in the util/acquire.py file.

---

## Accessing the Database

The Telco customer data is located in the MySQL database at data.codeup.com in the telco_churn database. In order to access this data you will need login credentials. Assuming you have credentials save these in a env.py file in the following form:

In [2]:
username = 'your_username'
password = 'your_password'
hostname = 'data.codeup.com'

Save this file in the notebooks directory (also save a copy in the util directory for use with the final report notebook). Once we have our login credentials we can import them from env create our database URL.

In [6]:
from env import username, password, hostname
database_name = 'telco_churn'

# This is the template for a database URL, we simply plug in our login credentials and the database we want to read from.
url = f'mysql+pymysql://{username}:{password}@{hostname}/{database_name}'

For our convenience we will turn this into a function.

In [7]:
def get_db_url(database_name, username = username, password = password, hostname = hostname):
    return f'mysql+pymysql://{username}:{password}@{hostname}/{database_name}'

---

## Acquiring the Data

The database contains several tables related with foreign keys. In order to acquire all of the data we will need to join all these tables together. We can achieve this with the following SQL query:

In [8]:
telco_query = '''
    SELECT *
    FROM customers
    JOIN payment_types USING (payment_type_id)
    JOIN internet_service_types USING (internet_service_type_id)
    JOIN contract_types USING (contract_type_id);
'''

For our convenience we can create a function that returns this query for us.

In [9]:
def get_telco_sql():
    return '''
        SELECT *
        FROM customers
        JOIN payment_types USING (payment_type_id)
        JOIN internet_service_types USING (internet_service_type_id)
        JOIN contract_types USING (contract_type_id);
    '''

Next we want to read the dataset from the database. We would also like to cache this data for quicker access in the future. If we are doing this we must check if the cached file already exists before we try to read it from the database. We can achieve this using pandas and the os module from the python standard library.

In [12]:
import pandas as pd
import os

# Check if a cached file already exists
os.path.exists('telco.csv')

False

In [13]:
# If it doesn't exist we read from the database
df = pd.read_sql(get_telco_sql(), get_db_url('telco_churn'))
df.head(2)

Unnamed: 0,contract_type_id,internet_service_type_id,payment_type_id,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,payment_type,internet_service_type,contract_type
0,2,1,2,0002-ORFBO,Female,0,Yes,Yes,9,Yes,...,Yes,Yes,No,Yes,65.6,593.3,No,Mailed check,DSL,One year
1,1,1,2,0003-MKNFE,Male,0,No,No,9,Yes,...,No,No,Yes,No,59.9,542.4,No,Mailed check,DSL,Month-to-month


In [15]:
# Now we can save the data to a .csv file
# Be sure to set index to False so we don't save the index to the .csv file
df.to_csv('telco.csv', index = False)

In [17]:
# Now that our data is cached we can read it from the .csv file
df = pd.read_csv('telco.csv')
df.head(2)

Unnamed: 0,contract_type_id,internet_service_type_id,payment_type_id,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,payment_type,internet_service_type,contract_type
0,2,1,2,0002-ORFBO,Female,0,Yes,Yes,9,Yes,...,Yes,Yes,No,Yes,65.6,593.3,No,Mailed check,DSL,One year
1,1,1,2,0003-MKNFE,Male,0,No,No,9,Yes,...,No,No,Yes,No,59.9,542.4,No,Mailed check,DSL,Month-to-month


For our convenience we can put all of this into a function which will check if the .csv file exists and read from there if it does, otherwise read from the MySQL database and cache the data for quicker access. We will also include a parameter use_cache
that will indicate whether or not to use the .csv file if it is available in case we would like to pull the data from the database regardless if the .csv file exists.

In [20]:
def get_telco_data(use_cache = True):
    # If the file is cached, read from the .csv file
    if os.path.exists('telco.csv') and use_cache:
        print('Using cache')
        return pd.read_csv('telco.csv')
    
    # Otherwise read from the mysql database
    else:
        print('Reading from database')
        df = pd.read_sql(get_telco_sql(), get_db_url('telco_churn'))
        df.to_csv('telco.csv', index = False)
        return df

In [23]:
# Let's test our function
get_telco_data().head(2)

Using cache


Unnamed: 0,contract_type_id,internet_service_type_id,payment_type_id,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,payment_type,internet_service_type,contract_type
0,2,1,2,0002-ORFBO,Female,0,Yes,Yes,9,Yes,...,Yes,Yes,No,Yes,65.6,593.3,No,Mailed check,DSL,One year
1,1,1,2,0003-MKNFE,Male,0,No,No,9,Yes,...,No,No,Yes,No,59.9,542.4,No,Mailed check,DSL,Month-to-month


In [22]:
get_telco_data(use_cache = False).head(2)

Reading from database


Unnamed: 0,contract_type_id,internet_service_type_id,payment_type_id,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,payment_type,internet_service_type,contract_type
0,2,1,2,0002-ORFBO,Female,0,Yes,Yes,9,Yes,...,Yes,Yes,No,Yes,65.6,593.3,No,Mailed check,DSL,One year
1,1,1,2,0003-MKNFE,Male,0,No,No,9,Yes,...,No,No,Yes,No,59.9,542.4,No,Mailed check,DSL,Month-to-month


---

## Conclusion

With the get_telco_sql() and get_telco_data() functions now available to us we can easily acquire the customer data we need to proceed forward with the project.