# Performing Tasks with APIs


In this assignment, you'll play the role of a Customer Acquisitions Analyst for a B2C software company. You have been asked to review a dataset of leads that the Sales team has generated through a web form on the company’s site. Most of the entries are accurate, but some people have entered fake phone numbers. 

The Sales team has requested that you identify and remove the rows with invalid phone numbers, so that they  don’t waste time calling numbers that don’t exist. Additionally, they want you to calculate the percentage of entries that contain invalid phone numbers. If more than 75% of the numbers are invalid, then the team will prioritize how best to optimize the web form to collect more accurate information. If the percentage is lower than that, then the team will not expend any resources on this task. 

Since the company doesn’t collect any data itself on valid phone numbers, you will use an API to perform this task. Specifically, you will use the [Veriphone](https://rapidapi.com/Veriphone/api/veriphone/endpoints) API to find and remove fake phone numbers.


### Getting Started
To get started, download the following files:
- `Unit 23 - Technical - Unsolved.ipynb` (_this notebook_)
- `ContactsList.csv`

Place these together in to a dedicated directory on your hard drive. We recommend creating a folder in your `Documents` directory for this week of class, as follows:

```
Documents/
  Python/
    Term III/
      Week23/
        Unit 23 - Technical - Unsolved.ipynb
        ContactsList.csv
```

Then, start Jupyter Notebook, navigate to the `Week23` directory, and open `Unit 23 - Technical - Unsolved.ipynb` in your browser. Make sure the `ContactsList.csv` file lives in the same directory.

### Problem Structure
Each problem will be accompanied by:
- **Instructions**
  - Each problem features a markdown cell explaining the problem.
- **Unfinished Code Cells**
  - Each problem has unfinished code cells, where you will write code to solve the problem.
  - Cells will contain either starter code for you to finish, or a comment explaining what your code should do.
- **Expected Output**. 
  - Many unfinished code cells will have output below them. You will be expected to write code that produces the same output.
  - Some unfinished code cells do _not_ have output below them. This is simply because not all code will generate output. Your solutions for these cells should _not_ print anything.
  
### Deliverables
To receive credit for this assignment, you must submit the following files:
- Your completed Jupyter Notebook
- `FilteredContactsList.csv`

Your completed Jupyter Notebook will be this file, but with all of the problems solved. In addition, one of the problems requires that you generate a file called `FilteredContactsList.csv`. When you're done with the assignment, run all cells to verify that your code executes as expected. Then, save and submit both this notebook _and_ the `FilteredContactsList.csv` file.

Good luck!

---

## Part 1: Loading Data

### Problem 1: Loading Data
You have been provided with a `filename` variable, containing the path to the `ContactsList.csv` file. Use it to complete the steps below:
- Read `ContactsList.csv` into a DataFrame called `contacts`
- Print the first 5 rows of `contacts`

---

Your code should print the following:

```
	first_name	last_name	address	city	state	zip	phone	Introduction
0	James	Butt	6649 N Blue Gum St	New Orleans	LA	70116	504-845-1427	Салам! Мен сиздердин компания менен кызматташу...
1	Josephine	Darakjy	4 B Blue Ridge Blvd	Brighton	MI	48116	810-374-9840	Chào! Tôi có một vài câu hỏi về công ty của bạ...
2	Art	Venere	8 W Cerritos Ave #54	Bridgeport	NJ	8014	856-264-4130	Здраво, јас сум заинтересиран да ја испробам н...
3	Lenna	Paprocki	639 Main St	Anchorage	AK	99501	907-921-2010	Haye! Waxaan qabaa su'aalo dhowr ah oo ku saab...
4	Donette	Foller	34 Center St	Hamilton	OH	45011	513-549-4561	Здраво! Имам неколико питања о вашој компанији...
```

In [1]:
# Provided Code -- Do NOT Edt!
import pandas as pd
import requests

filename = 'ContactsList.csv'

In [2]:
# TODO: Load `ContactsList.csv` into `contacts` variable
contacts = pd.read_csv(filename)

In [3]:
# TODO: Print first 5 rows of `contacts`
contacts.head()

Unnamed: 0,first_name,last_name,address,city,state,zip,phone,Introduction
0,Art,Venere,8 W Cerritos Ave #54,Bridgeport,NJ,8014,8562644130,"Здраво, јас сум заинтересиран да ја испробам н..."
1,Abel,Maclead,37275 St Rt 17m M,Middle Island,NY,11953,6316773675,"Hej, chciałbym się skontaktować w sprawie wspó..."
2,Frieden,Richard,6360 Wilshire Blvd Ste 506,Los Angeles,CA,90048,3236553854,Živjo! Imam nekaj vprašanj o vašem podjetju v ...
3,Tina,Mendoza,10599 Michael Cliffs,Dannyfort,PA,96251,3148244193,"Ola, estou interesado en probar o seu servizo...."
4,Richard,Garcia,13942 Flores Greens,New David,KS,11905,3612617977,मलाई लाग्छ तपाईंको सेवा हाम्रो व्यवसाय को लागी...


## Part 2: API Registration & Connection Test
Before you can use the Veriphone API to filter fake leads from your reformatted data, you will: 
- Register for the Veriphone API
- Explore the documentation
- Write a function to call the API

### Problem 1: Registering for & Connecting to the Veriphone API
In this problem, you will register for a Veriphone account, and then read the documentation to determine which endpoint and parameters are required to retrieve a response.

First, navigate to [Veriphone](https://veriphone.io), and register for a new account. Then, save your API Key in the empty `veriphone_api_key` variable provided in the code cell below. Note that you will _not_ be charged – the first 1,000 requests per month are free.

In [4]:
# TODO: Fill out your API Key below
api_key = '5F630A26E918470D966FB5311E8F4478'

### Problem 2: Determining if a Phone Number is Valid

  
Review the **VERIFY** section of the [Veriphone API Documentation](https://veriphone.io/docs/v2). It explains that we need the following `params` to make a request:
- `key`: This is the `api_key` variable defined above.
- `phone`: A phone number, which will come from the `contacts.phone` column
- `default_country`: Country in which the phone number is registered. For us, this will be `'US'`. 

In addition, note that JSON returned by this API has the following shape:

```
{'status': 'success',
 'phone': '9999999999',
 'phone_valid': False,
 'phone_type': 'unknown',
 'phone_region': '',
 'country': '',
 'country_code': '',
 'country_prefix': '0',
 'international_number': '+1 999-999-9999',
 'local_number': '(999) 999-9999',
 'e164': '+19999999999',
 'carrier': ''}
```

The `phone_valid` key contains the data we are looking for. If a number is valid, the value will be `True`. If a number is invalid - as in the above example - the value will be `False`.

Next, you will write a function that uses the Veriphone API to determine if a phone number is fake. You will re-use this function in a later problem.

You have been provided with starter code in the cell below. You must complete the lines containing `# TODO` to solve this problem. Follow the steps below:
- Save the URL of the **VERIFY** endpoint into the `url` variable
- Use `requests` to perform a `GET` request to `url`, using the provided `params`
- Check if the status code of the response equals `200`
  - If not, return `False`
  - If so, convert the response to JSON, then return the value of the `phone_valid` key
  


In [5]:
# Declare `is_valid_phone` function, accepting `number` argument
def is_valid_phone(number):
    # TODO: Create `url` variable containing URL of the `verify` endpoint
    base_url = 'https://api.veriphone.io/v2/verify'
    # Create `params` dictionary. This has already been provided to you below. 
    params = {
        'key': api_key,
        'phone': number,
        'default_country': 'US',
    }
    
    # TODO: Use `requests` to perform `GET` request to `url` with `params`
    response = requests.get(base_url, params = params)
    # TODO: Check if `status_code` of `response` is NOT `200`
    if response.status_code != 200:
        # TODO: If not, return `False`
        return False
    # TODO: Otherwise, convert `response` to JSON, and return value of `phone_valid` key
    return response.json()['phone_valid']



After implementing `is_valid_phone`, execute it with the following test values:
- `2024561111` (Phone Number of the US White House)
- `9999999999` (Clearly fake data)

---

Your code should print the following:

```
True

False
```

In [6]:
is_valid_phone('2024561111')

True

In [7]:
is_valid_phone('9999999999')

False

## Part 3: Filtering Out Fake Entries

Now that you've written functions to validate phone numbers, you'll use the API to augment your DataFrame with a new column indicating whether each row contains a valid phone number.

### Problem 1: Adding a `real_phone_number` Column
First, add a new column to `contacts`, called `real_phone_number?`, with a default value of `False`.

In [8]:
# TODO: Add `real_phone_number?` and `ValidAddress` columns, defaulted to `False`
contacts['real_phone_number?'] = False

### Problem 2: Identifying Real Numbers
Next, you will iterate through each row in `contacts` and set `real_phone_number?` equal to `True` for any numbers that are, in fact, real.

Follow the steps below:
- Iterate over the `index` and `row` items of `contacts`
  - Use [iterrows](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html), which is similar to `enumerate`.
  - Within the loop:
      - Extract the `phone` field from each `row` into a variable called `phone_number`.
      - Determine if this number is valid using `is_valid_phone` number.
      - Set the value of `real_phone_number?` at `index` equal to the above result.
      
      
- Print the `value_counts` of the `real_phone_number?` column.

---

Your code should print the following:

```
True     75
False    50
Name: ValidPhone, dtype: int64
```

In [9]:
# TODO: Iterate over each `index` and `row` of `contacts`
for index, row in contacts.iterrows():
    
    # TODO: Extract `phone_number` from `row`
    phone_number = row['phone']
    
    # TODO: Check if `is_valid_phone`
    real_phone_number = is_valid_phone(phone_number)
    
    # TODO: Set `real_phone_number?` equal to result of API call
    contacts.loc[index, 'real_phone_number?'] = real_phone_number

In [10]:
contacts['real_phone_number?'].value_counts()

True     75
False    50
Name: real_phone_number?, dtype: int64

### Problem 3: Finding the Real Ones
Next, create a DataFrame called `verified_contacts`, consisting of only the rows containing a valid phone number _and_ a valid address. Print the first 5 rows of the result.

Then, print out the percentage of the data that turned out to be good, so Sales has a quantitative measure of their lead quality. Use the formula `valid_contacts_length / original_length * 100`, and round your result to two decimal places.

<hr>

Your code should print the following:

```

    first_name	last_name	address	city	state	zip	phone	Introduction	real_phone_number?
0	Art	Venere	8 W Cerritos Ave #54	Bridgeport	NJ	8014	8562644130	Здраво, јас сум заинтересиран да ја испробам н...	True
1	Abel	Maclead	37275 St Rt 17m M	Middle Island	NY	11953	6316773675	Hej, chciałbym się skontaktować w sprawie wspó...	True
2	Frieden	Richard	6360 Wilshire Blvd Ste 506	Los Angeles	CA	90048	3236553854	Živjo! Imam nekaj vprašanj o vašem podjetju v ...	True
3	Tina	Mendoza	10599 Michael Cliffs	Dannyfort	PA	96251	3148244193	Ola, estou interesado en probar o seu servizo....	True
4	Richard	Garcia	13942 Flores Greens	New David	KS	11905	3612617977	मलाई लाग्छ तपाईंको सेवा हाम्रो व्यवसाय को लागी...	True

60.0%
```

In [11]:
# TODO: Filter for only `verified_contacts`
verified_contacts = contacts.loc[(contacts['real_phone_number?'] == True)]

In [12]:
# TODO: Print `verified_contacts`
verified_contacts.head() 

Unnamed: 0,first_name,last_name,address,city,state,zip,phone,Introduction,real_phone_number?
0,Art,Venere,8 W Cerritos Ave #54,Bridgeport,NJ,8014,8562644130,"Здраво, јас сум заинтересиран да ја испробам н...",True
1,Abel,Maclead,37275 St Rt 17m M,Middle Island,NY,11953,6316773675,"Hej, chciałbym się skontaktować w sprawie wspó...",True
2,Frieden,Richard,6360 Wilshire Blvd Ste 506,Los Angeles,CA,90048,3236553854,Živjo! Imam nekaj vprašanj o vašem podjetju v ...,True
3,Tina,Mendoza,10599 Michael Cliffs,Dannyfort,PA,96251,3148244193,"Ola, estou interesado en probar o seu servizo....",True
4,Richard,Garcia,13942 Flores Greens,New David,KS,11905,3612617977,मलाई लाग्छ तपाईंको सेवा हाम्रो व्यवसाय को लागी...,True


In [13]:
# TODO: Print percentage of valid contacts
print(str(len(verified_contacts)/ len(contacts) * 100) +'%')

60.0%


That implies a 40% fake lead rate. This is below the 75% threshold the Sales team had set, which means that they will not prioritize any improvements to the web form. 

### Problem 4: Export CSV of Real Contacts
Finally, save the real contacts a CSV, called `FilteredContactsList.csv`. The first five rows of this spreadsheet should look as follows:

```
first_name,last_name,address,city,state,zip,phone,Introduction,real_phone_number?
Art,Venere,8 W Cerritos Ave #54,Bridgeport,NJ,8014,8562644130,"Здраво, јас сум заинтересиран да ја испробам нашата услуга. Кога би било добро време да закажете консултација?",True
Abel,Maclead,37275 St  Rt 17m M,Middle Island,NY,11953,6316773675,"Hej, chciałbym się skontaktować w sprawie współpracy z twoimi usługami. Kiedy jest dobry czas na rozmowę?",True
Frieden,Richard,6360 Wilshire Blvd Ste 506,Los Angeles,CA,90048,3236553854,Živjo! Imam nekaj vprašanj o vašem podjetju v zvezi z vzpostavljanjem partnerstva. Ali imate ta ali naslednji teden nekaj časa za klepet o logistiki?,True
Tina,Mendoza,10599 Michael Cliffs,Dannyfort,PA,96251,3148244193,"Ola, estou interesado en probar o seu servizo. Cando sería un bo momento para programar unha consulta?",True
```

**Hints**
- Be sure to omit the index when you create your CSV.

In [14]:
# TODO: Finally, output real contacts to`FilteredContactsList.csv`
contacts.to_csv('FilteredContactsList.csv', index=False)