<div>
<img src="https://edlitera-images.s3.us-east-1.amazonaws.com/new_edlitera_logo.png" width="500"/>
</div>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

In [None]:
import pandas as pd
import numpy as np
import datetime

pd.options.display.float_format = '{:,.2f}'.format

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

## Export data to text files (.csv, .text, etc.)

* we can use the `.to_csv()` method

* we have lots of options: 
    * we can choose a different separator (e.g. '|', ' ')
    * we can choose to include or exclude the index values
    * we can choose to include or exclude the column headers
    * we can optionally only output a subset of the columns

* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

In [None]:
data = pd.read_csv(
    'https://edlitera-datasets.s3.amazonaws.com/timeseries_survey_sample.csv', 
    index_col=['Date'], 
    parse_dates=['Date']
)

data.head()

In [None]:
data.info()

<br>

Let's output to a `.csv` file all the rows where the `Location` is `Budapest` and `Sensor A` value is greater than 1.49 and `Sensor B` value is less than 2.12. 

In [None]:
output = data.loc[(data['Location'] == 'Budapest') & 
                 (data['Sensor A'] > 1.49) & 
                 (data['Sensor B'] < 2.12)]

output.head()

In [None]:
output.shape

In [None]:
output.to_csv('output/budapest_values.csv', index=False)

**NOTE:** You can use the optional `sep` parameter of the `to_csv` method to specify a different delimiter for your data (tab, pipe, space, etc.). By default, the delimiter is `,`.

In [None]:
output.to_csv(
    'output/budapest_values.txt', 
    index=True, 
    sep='|'
)

<br/>
<br/>

### How would we generate such a file for each city?
We want to do it algorithmically, not by hand (what if we're tracking 1000 locations)?

In [None]:
data.head()

In [None]:
locations = data['Location'].unique()
locations

In [None]:
for location in locations:
    output = data.loc[(data['Location'] == location) & 
                 (data['Sensor A'] > 1.49) & 
                 (data['Sensor B'] < 2.12)]

    output.to_csv(
        f'output/{location.lower().strip()}_output.csv', 
        index=True
    )

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

## BONUS: Export data to Excel files

* we can use the `.to_excel()` method

* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

* we have lots and lots of options

* we will need to specify an 'engine': either 'openpyxl' or 'xlsxwriter'
    * these actually help Python write the output in the correct format

* `pip install openpyxl` or `pip install xlsxwriter`

* `xlsxwriter` seems to be the more popular / more actively maintained at this time
    * also gives you lots of control over the actual output
    * can control font sizes, alignments, font names, etc.

* **NOTE** We will be using `xlsxwriter` for the code below. Make sure you install it first. 

In [None]:
data = pd.read_csv(
    'https://edlitera-datasets.s3.amazonaws.com/timeseries_survey_sample.csv', 
    index_col=['Date'], 
    parse_dates=['Date']
)

data.head()

<br>

Let's output to a `.xlsx` file all the rows where the `Location` is `Budapest` and `Sensor A` value is greater than 1.49 and `Sensor B` value is less than 2.12. 

In [None]:
output = data.loc[(data['Location'] == 'Budapest') & 
                 (data['Sensor A'] > 1.49) & 
                 (data['Sensor B'] < 2.12)]

output.to_excel(
    'output/budapest_values.xlsx', 
    index=True, 
    sheet_name='Budapest',
    float_format='%.4f',
    columns=['Sensor A', 'Sensor B'],
    engine='xlsxwriter'
)

<br/>
<br/>
<br/>

### You can export pivot tables to Excel

Let's compute the average, minimum and maximum values for `Sensor A` and `Sensor B` by `Location`.

In [None]:
data.head()

In [None]:
output = data.pivot_table(
    index=['Location'],
    values=['Sensor A', 'Sensor B'],
    aggfunc=['mean', 'min', 'max']
)

output

Let's output this to Excel:

In [None]:
output.to_excel(
    'output/averages.xlsx',
    index=True, 
    sheet_name='Average Values',
    float_format='%.6f',
    engine='xlsxwriter'    
)

<br/>
<br/>
<br/>

### You can even export charts to Excel (more advanced)

Exporting charts in addition to data requires a slightly different approach:
* create an `ExcelWriter` object using the `pd.ExcelWriter` constructor
* pass the `ExcelWriter` object to `to_excel()`
* create a chart using the XlsxWriter `chart` class (this is an Excel-compatible chart, not a `pandas` or a `matplotlib` chart)
* from the `ExcelWriter` object get a hold of the workbook and the worksheet
* insert the chart in the desired worksheet using the `insert_chart()` method of the worksheet object.

<Br>

Let's export only the data for Dubrovnik to an Excel file and include a scatter plot showing the values of `Sensor A` and `Sensor B`. 

In [None]:
data.head()

In [None]:
output = data.loc[data['Location'] == 'Dubrovnik']
output.head()

Let's create the ExcelWriter object.

In [None]:
writer = pd.ExcelWriter(
    'output/dubrovnik.xlsx', 
    engine='xlsxwriter' 
)

sheet_name = 'Data'

Now we can use this `ExcelWriter` object to output our data:

In [None]:
output.to_excel(writer, sheet_name=sheet_name)

In [None]:
# Get a reference for the Excel Workbook
workbook = writer.book

In [None]:
# Get a reference for the Excel Worksheet
worksheet = writer.sheets[sheet_name]

In [None]:
output.head()

In [None]:
# Add a scatter chart to the Workbook
chart = workbook.add_chart({'type': 'scatter'})

# Configure the chart series. 
# Here I'm using the fact that Sensor A
# values will be in column B and Sensor B
# values will be in column C of the Worksheet.
# We can automatically determine how many rows
# of data to include in our series by just
# looking at the number of rows we have in our
# DataFrame.

rows_count = output.shape[0] + 1

chart.add_series({
    'categories': f'=Data!$B$2:$B${rows_count}',
    'values': f'=Data!$C$2:$C${rows_count}',
    'marker': {'type': 'circle', 'size': 4},
    'name': 'Dubrovnik'
})

# Configure the chart axes.
chart.set_x_axis({'name': 'Sensor A'})
chart.set_y_axis({'name': 'Sensor B'})

# Insert the chart into the worksheet.
# The first argument tells us the cell
# where the chart should be inserted.
worksheet.insert_chart('F2', chart)

# Close the Excel writer object and write the file.
writer.close()

<br>

**You can read more about the different types of charts available here:**
* https://xlsxwriter.readthedocs.io/chart.html

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

## Advanced: export data to `.html`
This would allow us to publish to an internal site and have the reports available to everyone.

In [None]:
data = pd.read_csv(
    'https://edlitera-datasets.s3.amazonaws.com/timeseries_survey_sample.csv',  
    parse_dates=['Date']
)

data.head()

In [None]:
html_template = '''
<html>
  <head><title>Report</title></head>
  <link rel="stylesheet" type="text/css" 
        href="https://edlitera-datasets.s3.amazonaws.com/df_style.css"/>
  <body>
    <center>
      <h1>Sensor Report<h1>
    </center>
    <center>
        {table}
    </center>
    <p>Have a nice day!</p>
  </body>
</html>
'''

In [None]:
locations = data['Location'].unique()
locations

In [None]:
table_html = ''

for location in locations:
    output = data.loc[(data['Sensor A'] > 1.7) & 
                      (data['Sensor A'] < 2) & 
                      (data['Location'] == location), 
                      ['Date', 'Sensor A', 'Sensor B']]
    table_html += f'<section><h2>{location}</h2>{output.to_html(index=False)}</section>'

with open('output/sensor_report.html', 'w') as f:
    f.write(html_template.format(table=table_html))