<img src="Images\Intro2.png" width=800/>

#### <font color='#F1C716'>TABLE OF CONTENTS</font>

[1. Import Data](#Topic1)<br>
[2. Combine Data](#Topic2)<br>
[3. Export Data](#Topic3)<br>


In [None]:
import pandas as pd

In this example, we are going to import data from zomato. The company provided us with three files:
- a csv file named as `basic_info.csv`, where basic information about the restaurant can be found, namely:

| Variable | Description |
|:--------:|:--------:|
|  ID   |   Unique id of every restaurant|
|  name   |  Name of the restaurant   |
|  location   |  Location of the restaurant   |
|  rest_type   |  Micro characterization of the restaurant   |
|  type   |  Macro characterization of the restaurant   |

- an excel file named as `addit_info.xlsx`, where additional information is provided:

| Variable | Description |
|:--------:|:--------:|
|  ID   |   Unique id of every restaurant|
|  approx_cost   |  The average cost for two people   |
|  online_order   |  Answer to the question: 'Has online order?'   |
|  book_table   |  Answer to the question: 'Has table booking?'   |

- a json file named as `rating.json`, where rating information is added:

| Variable | Description |
|:--------:|:--------:|
|  ID   |   Unique id of every restaurant|
|  rating   |  Average rating out of 5   |
|  votes   |  Number of ratings casted by people   |

The goal today is to import all the data, combine them and save it to a csv file.

<a id="Topic1"></a>

<div class="alert alert-block alert-success">
    
# 1. Import Data
    
</div>

<div class="alert alert-block alert-info">

## 1.1. Import CSV Files

</div>

__`Step 01`__ Import the file `basic_info.csv` using the pandas function `read_csv()` and save it as `basic_info`.

In [None]:
basic_info = pd.read_csv(r'Data\basic_info.csv')
basic_info

__What is happening?__ <br>
While usually csv files are separated with commas (','), and this is the default import, sometimes we can have csv files delimited with other characters, such as tabs ('\t'), semi-colon (';'), vertical bars ('|'), among others.

__`Step 02`__ This time, import the file `basic_info.csv` using the pandas function `read_csv()` and save it as `basic`, but define the delimeter as '\t', with `sep = '\t'`.

In [None]:
basic = pd.read_csv(r'Data\basic_info.csv', sep = '\t')
basic

<div class="alert alert-block alert-info">

## 1.2. Import Excel Files

</div>

__`Step 03`__ Import the file `additional_info.excel` using the pandas function `read_excel()` and save it as `add`.

In [None]:
add = pd.read_excel(r'Data\additional_info.xlsx')
add

__What is happening?__ <br>
While in the information provided, we should have information about the approximate cost, and the answer to the questions 'Has online order?' and 'Has table booking?', we are just import the approximate cost. <br>

__Why?__<br>
Do not forget that excel files can have more than one sheet. By default, if you don't specify the sheet, you are going to import just the first one.

__`Step 04`__ This time, import the file `additional_info.excel` using the pandas function `read_excel()` and save it as `cost`.

In [None]:
cost = pd.read_excel(r'Data\additional_info.xlsx')

__`Step 05`__ Import now the second sheet on the excel named as `Online`and save it as `online`.

In [None]:
online = pd.read_excel(r'Data\additional_info.xlsx', sheet_name = 'Online')
online

<div class="alert alert-block alert-info">

## 1.3. Import JSON Files

</div>

__`Step 06`__ Import the json file `rating.json` and save it as `rating`.

In [None]:
rating = pd.read_json('rating.json')
rating

__What is happening?__ <br>
Json files can be stored in many ways, and Pandas allows to read json files with different formats. You can define the right format using the parameter `orient`. Here are some possible formats that you can upload automatically with Pandas:

| Orient | Description |
|:--------:|:--------:|
|  split   |dictionary like {index -> [index], columns -> [columns], data -> [values]}|
|  records   |list like [{column -> value}, ... , {column -> value}]|
|  index   |dictionary like {index -> {column -> value}}|
|  table   |dictionary like {‘schema’: {schema}, ‘data’: {data}}|


__`Step 07`__ Import again the json file `rating.json` and save it as `rating`, but this time define `orient = table`.

In [None]:
rating = pd.read_json(r'Data\rating.json', orient = 'table')
rating

<a id="Topic2"></a>

<div class="alert alert-block alert-success">
    
# 2. Combine Data
    
</div>

Now that we have imported all our data, it is time to combine them in a way to get just one single DataFrame.

<div class="alert alert-block alert-info">

## 2.1. Join

</div>

We can use the join method to combine DataFrames based on their indexes.

__`Step 08`__ Join the DataFrame `basic`with `cost` and save in a new dataframe called `df`.

In [None]:
df = basic.join(cost, lsuffix='_df1', rsuffix='_df2')

In [None]:
df

<div class="alert alert-block alert-info">

## 2.2. Merge

</div>

__`Step 09`__ Merge the DataFrame `df`with `online` and save in a new dataframe called `df2`.

In [None]:
df

In [None]:
df2 = df.merge(online, left_on = 'ID_df1', right_on='ID', how = 'inner')
df2

<div class="alert alert-block alert-info">

## 2.3. Concatenate

</div>

__`Step 10`__ Concatenate the DataFrame "rating" to our df2 and name it as `final`.

In [None]:
final = pd.concat((df2, rating), axis = 1)
final

Due to our agreggations, it seems that we have several columns with similar contents, namely the "IDs". Before going further, let's drop those unneded columns like we saw in the previous week.

__`Step 11`__ Let's check first the name of our columns.

In [None]:
final.columns

__`Step 12`__ Remove all the columns related to ID, except the first column, `ID_df1`.

In [None]:
final.drop(['ID_df2','ID'], axis = 1, inplace = True)
final.columns

__`Step 13`__ Define the column `ID_df1` as the index of our DataFrame.

In [None]:
final.set_index('ID_df1', inplace = True)

<div class="alert alert-block alert-info">

## 2.4. Append

</div>

In [None]:
final.head(1)

__`Step 14`__ We have a new restaurant to upload in our list! Append it to your `final` DataFrame.

In [None]:
new = {
  'name': ['O Bitoque'],
  'location': ['Banashankari'],
  'rest_type': ['Casual Dining'],
  'type': ['Buffet'],
  'approx_cost': [900],
  'online_order': ['Yes'],
  'book_table': ['Yes'],
  'rating': [4.7],
  'votes': [12] 
}

In [None]:
new = pd.DataFrame(new)
new

In [None]:
#final = final.append(new, ignore_index = True) # older versions
final = pd.concat((final, new), ignore_index=True)
final

<a id="Topic3"></a>

<div class="alert alert-block alert-success">
    
# 3. Export Data
    
</div>

__`Step 15`__ Export the final dataframe to a csv file named as `final_zomato.csv`.

In [None]:
final.to_csv('final_zomato.csv')