# <center> Working with Pandas and API data </center>

- [What is an API](#section_1)
- [Convert API Data into a Pandas Object](#section_2)
- [Use Pandas built-in functions to send and receive API data](#section_3)

<hr>

### What is an API <a class="anchor" id="section_1"></a>

When working on daily tasks, data professionals often need to access data from third-party APIs. This system architecture is widely used in modern applications especially when the data source is continuously updating such as stock market price data, weather data and so on.

The term API is short for `Application Programming Interface`. It’s a type of interface or a connection layer that allows computer programs to communicate and talk to each other. 

Imagine if you have a weather APP on your smartphone and you want to check the weather forecast for the upcoming weekend.

Since weather data is always changing, the client or application will need to request that information from a third party database. 

The server will then process the request and send back the needed response to the client through API.

Therefore, the API layer in the middle will be responsible for organizing that communication, which is technically known as `request` and `response`.


<br>

### Convert API Data into a Pandas Object <a class="anchor" id="section_2"></a>

In [None]:
# Import Pandas library


# Import requests library to handle API connection


# Import pprint library to display data structures


# Initialize Pretty Printer 


In the examples below, we will be using the [Open Notify](http://open-notify.org/Open-Notify-API/) API. It is designed to provide continuous updates about the International Space Station or ISS current location and crew members onboard. The json file about astronauts is stored on this [link](http://api.open-notify.org/astros.json).

In [None]:
# Pass the API query using requests library


# Convert response data into JSON format


# Examine the response data



We notice that the response data is basically a Python dictionary of several key - value pairs. And the dictionary tells us there are currently 10 crew members on board the international space station. 

In [None]:
# Create a DataFrame of astronauts currently aboard the ISS


# Display the DataFrame


<br>

### Use Pandas Built-In Functions to Send and Receive API Data <a class="anchor" id="section_2"></a>

Google BigQuery is a cloud based data warehouse that allows users to perform data analytics and machine learning services using SQL. 

If you are new to BigQuery, you can get a free sand-box account to explore the platform and follow this example. 

Using my Google account, I created a new project called `pandas-io` and an empty Dataset called `demo`. 

The term data set here is more like a logical container of small data files.

**Note**:
<br>
<br>
Pandas has split off Google BigQuery support into the separate package pandas-gbq.

If you don’t already have this library, you need to install it using the `pip command`, and you also need a valid BigQuery account. Use the BigQuery sandbox to try the service for free.

We will be working with 2 main Pandas functions, [to_gbq()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html) and [read_gbq()](https://pandas.pydata.org/docs/reference/api/pandas.read_gbq.html) to send and receive data. 

### Pandas to_gbq() Function <a class="anchor" id="section_3"></a>

In [None]:
# Import library for Pandas Google BigQuery support


Notice the important parameters that we need to assign in the following example:

- `destination_table` will be the name of table in the platform

- `project_id` will be the name of the project we created earlier in our account

- `if_exists` parameter will give us some warning messages if the DataFrame we are uploading already exists in our account 


In [None]:
# Use to_gbq() method to upload a DataFrame into BigQuery platform


Here, if we switch to our sand-box account, we can see our astronauts table with all its values already on the cloud. 

Back to the example above, if we try to run the same code cell one more time, it would give us an error.

Basically, it tells us that we can not send this DataFrame to the cloud because we already have a table with the same name in our project. 

It suggests that we change the `if_exisit` parameters to one of the other options: `replace` OR `append`.

Let's try to do that and change it to `replace`, then we run the the cell again"

In [None]:
# Use to_gbq() method to upload a DataFrame into BigQuery platform


It seems to be successful this time.

**QUIZ**
<br>
Can you guess what would happen if we use `append`? Give it a try!

<br>

### Pandas read_gbq() Function <a class="anchor" id="section_3"></a>

Now that we have our data already in the cloud, let's see what else we can do.

In a real life scenario, you will probably upload a large dataset with thousands of rows and columns, then you can use the BigQuery platforms to do your analysis.

For us today, we will do a simple example to run SQL query with a `groupby` statement.

Basically we want to know the number of people within each spacecraft in our astronauts table.

In [None]:
# Create a query from pandas-io.demo.astronauts


In [None]:
# Use read_gbq function to read the query


In [None]:
# Display the dataframe


You can also use other parameters such as `index_col` to assign the DataFrame index.

Give it a try and see what would happen!