# Summary
### Sharing Pandas data frames as websites on AWS S3

<p> 
This project explores the link between CSV-files, Pandas data frames, HTML-files and AWS S3. The CSV-files are read into Pandas data frames and then converted into HTML-files and then uploaded on S3 as the 'ContentType': 'text/html'. This allows to create a simple website structure with an index HTML-file as a start page. The startpage links to the DF-subpages. This is an easy way to share data. The data can be public or private. In the latter case multiple ways to protect the data are available, which are not explored in detail here. 
</p> 

<p>
This project is based on the preliminary:<br>
<a href ="https://github.com/RolfChung/Boto3_FileManagement_on_AWS.git" target = _blank>
Boto3_FileManagement_on_AWS</a> project.
</p>


<p>
The project relies heavily on the  
<a href = https://boto3.amazonaws.com/v1/documentation/api/latest/index.html target=_blank> 
Boto3 documentation.</a> <br> 
According to the doc: 
</p> 

<p> 
“You use the AWS SDK for Python (Boto3) to create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). The SDK provides an object-oriented API as well as low-level access to AWS services.” 
</p> 

<p>This project creates an  

### S3_helpers_pckg 

<p> 
The package stores a class with useful helper functions, mostly manipulating the dicts of responses.<br> 
The functions are mostly self defined, but other functions for example from Github and the doc are integrated.<br> 
In this case credits are given.<br> 
The pckg is a work in progress. 

</p> 

<p>Several topics are examined here. <br> 
For example:</p> 
<ul> 
<li>Setting up AWS clients</li> 
<li>Pandas and HTML</li> 
<li>Converting Pandas data frames to HTML</li> 
<li>Styling a data frame</li> 
<li>Pandas data frames and csv downloads from S3</li> 
<li>Styling a data frame</li> Uploading HTML-files to AWS S3
<li>Creating an index html page</li> 
<li>Reading an streaming csv object into a Pandas data frame</li>     
<li>Uploading HTML-files to AWS S3</li>   
<li>Displaying an html file in a Jupyter notebook cell with</li> 
</ul> 

<p> 
The credentials are secured with a <a href="www.dotenv.org/docs" target=_blank> 
dotenv.</a>
</p> 
 

# Import packages

In [1]:
# Import packages

import pandas as pd
import pandasql
from pandasql import sqldf

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import numpy as np
import os as os
import time
import pprint
import sys
import re
import json
import glob 
import jinja2 # for styling data frames 
from IPython.display import display, HTML # Displaying HTML in Jupyter notebook

from pathlib import *

# security
from dotenv import load_dotenv
import logging

# display html in code cells
from IPython.display import HTML, display, Markdown, Latex, Image

#### Import Boto

In [2]:
# import boto3
import boto3
import botocore

### S3_helpers_pckg

<p>
stores a class with useful helper functions, mostly manipulating the dicts of responses.<br>
The functions are mostly self defined, but other functions for example from Github and the doc are integrated.<br>
In this case credits are given.<br>
The pckg is a work in progress.
</p>

In [3]:
import S3_helpers_pckg

In [4]:
initpy_path = S3_helpers_pckg

print(type(initpy_path))
print(str(initpy_path)[1:30])


<class 'module'>
module 'S3_helpers_pckg' from


In [5]:
from S3_helpers_pckg import S3_helpers

#### Splitting the Tetuan csv file into multiple files

<p>
for later use.
</p>

In [6]:
tetuan=pd.read_csv('csv/Tetuan_City_power_consumption.csv')

for idx, chunk in enumerate(np.array_split(tetuan, 3), start=1):
    chunk.to_csv(f'csv/part_tetuan_{idx}.csv')

### Checkout the directories

In [7]:
cwd = os.getcwd()
# print(cwd)

#### Using path instead os

In [8]:
p = Path(".")
print(type(p))

cwd_path = p.cwd()
print(str(cwd_path)[-20:-1])

<class 'pathlib.WindowsPath'>
oto3_CSV_Pandas_HTM


In [9]:
p1 = p/"upload_files/wolfs"
constructed_windows_path=p1.absolute()

str(constructed_windows_path)[-30:-1]

'Pandas_HTML\\upload_files\\wolf'

In [10]:
# os.chdir('G:\Other computers\Mein Laptop (1)\data_camp_projects\Turing_DataAnalysis')
os.chdir('G:\Other computers\Mein Laptop (1)\data_camp_projects\AWS_boto3')

In [11]:
### List directories

directories_list = os.listdir()
directories_list

['AWS_boto3_S3_FileManagement.ipynb',
 'txt',
 '.ipynb_checkpoints',
 'data',
 '.env',
 'S3_helpers_pckg',
 'upload_files',
 'download_files',
 'csv',
 'html',
 '.git',
 'README.md',
 '.gitignore']

In [12]:
file_list = glob.glob("*")
print(file_list)

['AWS_boto3_S3_FileManagement.ipynb', 'txt', 'data', 'S3_helpers_pckg', 'upload_files', 'download_files', 'csv', 'html', 'README.md']


### List directories

<p>using glob or os.</p>

In [13]:
directories_list = os.listdir()
directories_list

['AWS_boto3_S3_FileManagement.ipynb',
 'txt',
 '.ipynb_checkpoints',
 'data',
 '.env',
 'S3_helpers_pckg',
 'upload_files',
 'download_files',
 'csv',
 'html',
 '.git',
 'README.md',
 '.gitignore']

In [14]:
file_list = glob.glob("*")
print(file_list)


['AWS_boto3_S3_FileManagement.ipynb', 'txt', 'data', 'S3_helpers_pckg', 'upload_files', 'download_files', 'csv', 'html', 'README.md']


In [15]:
files_list = []
for root, directories, files in os.walk(cwd):
    for name in files:
        files_list.append(name)
print(files_list)

['.env', 'AWS_boto3_S3_FileManagement.ipynb', 'AWS_boto3_Sharing_PandasDataFrames_as_websites.ipynb', 'README.md', '.gitignore', 'AWS_boto3_S3_FileManagement-checkpoint.ipynb', 'AWS_boto3_Sharing_PandasDataFrames_as_websites-checkpoint.ipynb', 'README-checkpoint.md', 'part_tetuan_1.csv', 'part_tetuan_2.csv', 'part_tetuan_3.csv', 'Tetuan_City_power_consumption.csv', 'tetuan_concatenated.csv', 'tetuan_concatenated-checkpoint.csv', 'get_it_done_2019_requests_datasd.csv', 'leopard.jpg', 'lion.jpg', 'snow_lion.jpg', 'tetuan_1.html', 'tetuan_2.html', 'tetuan_3.html', 'tetuan_concatenated_3cols.html', 'tetuan_concatenated_5.html', 'tetuan_html_table_styled.html', 'tetuan_index.html', 'tetuan_1-checkpoint.html', 'tetuan_concatenated_3cols-checkpoint.html', 'tetuan_html_table_styled-checkpoint.html', 'tetuan_index-checkpoint.html', 'S3_helpers.py', 'settings.py', '__init__.py', 'S3_helpers-checkpoint.py', 'settings-checkpoint.py', '__init__-checkpoint.py', 'S3_helpers.cpython-310.pyc', 'S3_help

### Creating requirement files

In [16]:
print("Current version of Python is ", sys.version)
print(pd.__version__)
print(np.__version__)
print(sns.__version__)

Current version of Python is  3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
1.4.3
1.21.5
0.11.2


#### Making directories

In [17]:
if not os.path.exists("txt"):
    # if the demo_folder directory is not present 
    # then create it.
    os.makedirs("txt")
    
if not os.path.exists("upload_files"):
    os.makedirs("upload_files")
    
if not os.path.exists("data"):
    os.makedirs("data")


In [18]:
!conda list > txt/requirements_file_conda_boto3.txt
!pip list > txt/requirements_file_pip_boto3.txt

#### Checking directories

In [19]:
directories=[]
files=[]


for r, d, f in os.walk(top=os.getcwd()):
    directories.append(d)
    files.append(f)
    
print(files[:2])

[['AWS_boto3_S3_FileManagement.ipynb', '.env', 'README.md', '.gitignore'], ['requirements_file_conda_boto3.txt', 'requirements_file_pip_boto3.txt']]


In [20]:
files_flattened=[item for sublist in files for item in sublist]

print(files_flattened)

['AWS_boto3_S3_FileManagement.ipynb', '.env', 'README.md', '.gitignore', 'requirements_file_conda_boto3.txt', 'requirements_file_pip_boto3.txt', 'AWS_boto3_S3_FileManagement-checkpoint.ipynb', 'AWS_boto3_Sharing_PandasDataFrames_as_websites-checkpoint.ipynb', 'README-checkpoint.md', 'get_it_done_2019_requests_datasd.csv', 'settings.py', '__init__.py', 'S3_helpers.py', '__init__.cpython-310.pyc', 'S3_helper_class_functions.cpython-310.pyc', 'settings.cpython-310.pyc', 'S3_helpers.cpython-310.pyc', 'settings-checkpoint.py', 'S3_helpers-checkpoint.py', '__init__-checkpoint.py', 'nice_cat.jpg', 'lion.jpg', 'cat_puma.jpg', 'tiger.jpg', 'cat_tiger.jpg', 'puma.jpg', 'cat_lion.jpg', 'cat_nice_cat.jpg', 'leopard.jpg', 'gepard.jpg', 'panther.jpg', 'snow_lion.jpg', 'jaguar.jpg', 'lion-checkpoint.jpg', 'leopard-checkpoint.jpg', 'tiger-checkpoint.jpg', 'jaguar-checkpoint.jpg', 'wolf.jpg', 'white_wolf.jpg', 'african_wolf.jpg', 'hyena.jpg', 'jackal.jpg', 'fox.jpg', 'fox-checkpoint.jpg', 'nile_crodile

# Setting up AWS resources

## Import AWS keys


In [21]:
 %run S3_helpers_pckg/settings.py

In [22]:
# print(Secret_Access_Key)
# print(Access_Key_ID)

## Starting: calling AWS services with Boto3
### Creating a session

<p>
is a fundamental operation, when working with Boto3, but it is not necessary to explicitly create one.
</p>

<p>
"A session manages state about a particular configuration. By default, a session is created for you when needed. However, it's possible and recommended that in some scenarios you maintain your own session." (Boto3 doc)
</p>

In [23]:
import boto3.session

# Create your own session
my_session = boto3.session.Session()

# Now we can create low-level clients or resource clients from our custom session
sqs = my_session.client('sqs')
s3 = my_session.resource('s3')

In [24]:
this_session=boto3.session.Session()

print(this_session)

Session(region_name='us-east-1')


<p>
AWS service operations require to set a AWS region.<br>
If a region is not provided the region stored in the .aws config file of the AWS CLI is used:<br>
region=us-east-1
</p>

<p>
Sessions can create resources and clients.<br>
More below.
</p>

In [25]:
session_sqs=this_session.client('sqs')
print("Memory adress: {}".format(session_sqs))

Memory adress: <botocore.client.SQS object at 0x000001DD53FEE3E0>


In [26]:
session_ec2=this_session.resource('ec2')
print("Memory adress: {}".format(session_ec2))

Memory adress: ec2.ServiceResource()


### Creating a client

<p>
"Clients provide a low-level interface to AWS whose methods map close to 1:1 with service APIs. All service operations are supported by clients. Clients are generated from a JSON service definition file." (Boto3 doc)
</p>

<p>
The name of the service and the keys are required.<br>
The service name here is 's3'.<br>
AWS Simple Storage Service is an object storage, which can store all kind of file types.
</p>

In [27]:
s3 = boto3.client('s3', region_name='us-east-1',
                  aws_access_key_id=Access_Key_ID,
                  aws_secret_access_key=Secret_Access_Key)

### Resources

<p>
Another path to create a client is using the resource method:
</p>

<p>
"Resources represent an object-oriented interface to Amazon Web Services (AWS). They provide a higher-level abstraction than the raw, low-level calls made by service clients."
</p>

In [28]:
# Create the resource
s3_resource = boto3.resource('s3', region_name='us-east-1', 
                             aws_access_key_id=Access_Key_ID,
                             aws_secret_access_key=Secret_Access_Key)

# Get the client from the resource
s3_resouce_client = s3_resource.meta.client

s3_resouce_client

<botocore.client.S3 at 0x1dd5679aa40>

### Creating another client with the helper package

In [29]:
# run S3_helpers_pckg/S3_helpers.py

### Reading an streaming csv object into a Pandas data frame

In [30]:
csv_tet_obj = s3.get_object(Bucket='csvdata111', Key='Tetuan_City_power_consumption.csv')
print(csv_tet_obj)

{'ResponseMetadata': {'RequestId': 'MR4GNEZ8MPQW5VKA', 'HostId': 'le/cqa0hDVjn+dwYsAQPnW3T8hUHOK9k+729X1KDoMp7qp0DRFWlBY3E89AwhlgPcBe1onX1TY8=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'le/cqa0hDVjn+dwYsAQPnW3T8hUHOK9k+729X1KDoMp7qp0DRFWlBY3E89AwhlgPcBe1onX1TY8=', 'x-amz-request-id': 'MR4GNEZ8MPQW5VKA', 'date': 'Thu, 03 Nov 2022 12:09:20 GMT', 'last-modified': 'Thu, 03 Nov 2022 10:45:22 GMT', 'etag': '"03d1833b9c7b4fb5f218a67bcb5ed299"', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '4222390'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2022, 11, 3, 10, 45, 22, tzinfo=tzutc()), 'ContentLength': 4222390, 'ETag': '"03d1833b9c7b4fb5f218a67bcb5ed299"', 'ContentType': 'binary/octet-stream', 'Metadata': {}, 'Body': <botocore.response.StreamingBody object at 0x000001DD568000D0>}


In [31]:
tet_df=pd.read_csv(csv_tet_obj['Body'])

In [32]:
tet_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52416 entries, 0 to 52415
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   DateTime                   52416 non-null  object 
 1   Temperature                52416 non-null  float64
 2   Humidity                   52416 non-null  float64
 3   Wind Speed                 52416 non-null  float64
 4   general diffuse flows      52416 non-null  float64
 5   diffuse flows              52416 non-null  float64
 6   Zone 1 Power Consumption   52416 non-null  float64
 7   Zone 2  Power Consumption  52416 non-null  float64
 8   Zone 3  Power Consumption  52416 non-null  float64
dtypes: float64(8), object(1)
memory usage: 3.6+ MB


In [33]:
tet_df.head(2)

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386
1,1/1/2017 0:10,6.414,74.5,0.083,0.07,0.085,29814.68354,19375.07599,20131.08434


### Presigned URL's

<p>
give temporary access to private files.
</p>

In [34]:
share_url = s3.generate_presigned_url(
ClientMethod='get_object',
ExpiresIn=3600,
Params={'Bucket': 'gid-requests','Key': 'potholes.csv'}
)

In [35]:
# Expiration in seconds

tetuan_presigned_url=\
s3.generate_presigned_url(ClientMethod='get_object', ExpiresIn=100,
                          Params={'Bucket':'csvdata111', 'Key':'Tetuan_City_power_consumption.csv'})

print(tetuan_presigned_url[:30])
tetuan_presigned_url

https://csvdata111.s3.amazonaw


'https://csvdata111.s3.amazonaws.com/Tetuan_City_power_consumption.csv?AWSAccessKeyId=AKIAQWGNUNBLJ5E3XWJG&Signature=FnT6wuQ60RHoaK5Kq7w6WvZ%2BROM%3D&Expires=1667477464'

In [36]:
tetuan_df_presigned_url=pd.read_csv(tetuan_presigned_url)

tetuan_df_presigned_url.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52416 entries, 0 to 52415
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   DateTime                   52416 non-null  object 
 1   Temperature                52416 non-null  float64
 2   Humidity                   52416 non-null  float64
 3   Wind Speed                 52416 non-null  float64
 4   general diffuse flows      52416 non-null  float64
 5   diffuse flows              52416 non-null  float64
 6   Zone 1 Power Consumption   52416 non-null  float64
 7   Zone 2  Power Consumption  52416 non-null  float64
 8   Zone 3  Power Consumption  52416 non-null  float64
dtypes: float64(8), object(1)
memory usage: 3.6+ MB


In [37]:
S3_object_2 = S3_helpers.S3_helpers(name='z', S3_client=s3)

### Pandas data frames and csv downloads from S3

<p>
are working seamlessly.<br>
At this point it makes sense to take a closer look and load multiple files into a data frame.<br>
The goal is to download the csv files, read those into Pandas data frames and then concat the df into one.
</p>


In [38]:
csv_file_list = glob.glob("csv/*")
tetuan_list = []

for file in csv_file_list:
    tetuan_list.append(file)
    
print(tetuan_list)

['csv\\Tetuan_City_power_consumption.csv', 'csv\\part_tetuan_1.csv', 'csv\\part_tetuan_2.csv', 'csv\\part_tetuan_3.csv', 'csv\\tetuan_concatenated.csv']


In [39]:
S3_object_2.upload_multiple_files_with_check(tetuan_list, 'csvdata111')

True

#### Getting a list of the files stored in csvdata
<p>
This returns dicts for every file stored packed into a list.<br>
<a href ="https://pythonexamples.org/python-list-of-dictionaries/" target = _blank>
python-list-of-dictionaries
</a>
</p>

In [40]:
response_csvdata = s3.list_objects(Bucket='csvdata111', Prefix='part_')

print(response_csvdata.keys())
print(response_csvdata['Contents'][0].keys())

dict_keys(['ResponseMetadata', 'IsTruncated', 'Marker', 'Contents', 'Name', 'Prefix', 'MaxKeys', 'EncodingType'])
dict_keys(['Key', 'LastModified', 'ETag', 'Size', 'StorageClass', 'Owner'])


<p>
Taking a look at the Contents-keys.
</p>

In [41]:
calling_csvdata = response_csvdata['Contents']

calling_csvdata_len = len(calling_csvdata)

print(calling_csvdata_len)
print(calling_csvdata[0].keys())

3
dict_keys(['Key', 'LastModified', 'ETag', 'Size', 'StorageClass', 'Owner'])


####  Accesing dicts in the list
<p>
to identify the files stored in the Bucket.
</p>

In [42]:
response_csvdata_files = []

for i in range(len(calling_csvdata)):
    key = calling_csvdata[i]['Key']
    response_csvdata_files.append(key)
    
print(type(response_csvdata_files))
print(response_csvdata_files)


<class 'list'>
['part_tetuan_1.csv', 'part_tetuan_2.csv', 'part_tetuan_3.csv']


<p>
Getting the csv files stored as "Bodies" in the objects.<br>
The list object is necessary for accessing the keys of the get_object response.
</p>

In [43]:
data_frames_objects = []

for file in calling_csvdata:
    get_file = s3.get_object(Bucket='csvdata111', Key=file['Key'])
    # print(get_file)
    get_body = get_file['Body']
    # print(get_body)
    get_df = pd.read_csv(get_body, index_col=0)
    # print(get_df)
    data_frames_objects.append(get_df)
                             

In [44]:
print(len(data_frames_objects))

3


In [45]:
for i in range(len(data_frames_objects)):
    print(data_frames_objects[i].columns.tolist())

['DateTime', 'Temperature', 'Humidity', 'Wind Speed', 'general diffuse flows', 'diffuse flows', 'Zone 1 Power Consumption', 'Zone 2  Power Consumption', 'Zone 3  Power Consumption']
['DateTime', 'Temperature', 'Humidity', 'Wind Speed', 'general diffuse flows', 'diffuse flows', 'Zone 1 Power Consumption', 'Zone 2  Power Consumption', 'Zone 3  Power Consumption']
['DateTime', 'Temperature', 'Humidity', 'Wind Speed', 'general diffuse flows', 'diffuse flows', 'Zone 1 Power Consumption', 'Zone 2  Power Consumption', 'Zone 3  Power Consumption']


In [46]:
data_frames_objects[0].head(1)

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386


In [47]:
data_frames_objects[0].shape

(17472, 9)

In [48]:
total_observations = 0
for i in range(len(data_frames_objects)):
    total_observations += data_frames_objects[i].shape[0]
    
print("total_observations: {}".format(total_observations))

total_observations: 52416


In [49]:
tetuan_concatenated = pd.concat(data_frames_objects)

In [50]:
tetuan_concatenated.head(4)

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386
1,1/1/2017 0:10,6.414,74.5,0.083,0.07,0.085,29814.68354,19375.07599,20131.08434
2,1/1/2017 0:20,6.313,74.5,0.08,0.062,0.1,29128.10127,19006.68693,19668.43373
3,1/1/2017 0:30,6.121,75.0,0.083,0.091,0.096,28228.86076,18361.09422,18899.27711


In [51]:
tetuan_concatenated.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 52416 entries, 0 to 52415
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   DateTime                   52416 non-null  object 
 1   Temperature                52416 non-null  float64
 2   Humidity                   52416 non-null  float64
 3   Wind Speed                 52416 non-null  float64
 4   general diffuse flows      52416 non-null  float64
 5   diffuse flows              52416 non-null  float64
 6   Zone 1 Power Consumption   52416 non-null  float64
 7   Zone 2  Power Consumption  52416 non-null  float64
 8   Zone 3  Power Consumption  52416 non-null  float64
dtypes: float64(8), object(1)
memory usage: 4.0+ MB


In [52]:
print("total_observations after concat: {}".format(tetuan_concatenated.shape[0]))

total_observations after concat: 52416


<p>
In the next step this "ETL" operation should be packed into the helper package.
</p>

In [53]:
tetuan_concatenated.to_csv("csv/tetuan_concatenated.csv", index=False, header=True)

## Pandas and HTML
### Converting Pandas data frames to HTML

In [54]:
# from IPython.display import HTML

#### Displaying html in a Jupyter cell using cell magic

In [55]:
%%html
<ul>
    <li>foo</li>
    <li>bar</li>
</ul>



#### DF to HTML

In [56]:
tetuan_concatenated[:5]
tetuan_concatenated_5=tetuan_concatenated[:5]

In [57]:
tetuan_concatenated_5.to_html("html/tetuan_concatenated_5.html")

In [58]:
display(HTML("html/tetuan_concatenated_5.html"))

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386
1,1/1/2017 0:10,6.414,74.5,0.083,0.07,0.085,29814.68354,19375.07599,20131.08434
2,1/1/2017 0:20,6.313,74.5,0.08,0.062,0.1,29128.10127,19006.68693,19668.43373
3,1/1/2017 0:30,6.121,75.0,0.083,0.091,0.096,28228.86076,18361.09422,18899.27711
4,1/1/2017 0:40,5.921,75.7,0.081,0.048,0.085,27335.6962,17872.34043,18442.40964


#### Selecting cols in df.to_html()

In [59]:
tetuan_concatenated_5.to_html('html/tetuan_concatenated_3cols.html',
                               render_links=True, col_space=30,
                               columns=['Wind Speed', 'diffuse flows', 'Humidity'])

In [60]:
display(HTML('html/tetuan_concatenated_3cols.html'))

Unnamed: 0,Wind Speed,diffuse flows,Humidity
0,0.083,0.119,73.8
1,0.083,0.085,74.5
2,0.08,0.1,74.5
3,0.083,0.096,75.0
4,0.081,0.085,75.7


### Styling a data frame
<p>
before exporting it to HTML.<br>
Styling a data frame creates an new styling object.<br>
Therefore the df.to_html() arguments do not apply any longer.<br>
Selecting the columns works only before styling the df.<br>
<a href="https://betterdatascience.com/style-pandas-dataframes/" target="_blank">
More info on betterdatascience.
</a>
</p>

In [61]:
# import jinja2
# for styling data frames


tetuan_concatenated_5=tetuan_concatenated[:5]

tetuan_concatenated_51=\
tetuan_concatenated_5.\
style.set_properties(**{'background-color': 'black','color': 'red', 
                        'border': '1px solid red', 'text-align':'left', 
                        'props': 'caption-side: bottom; font-size:1em;'})

tetuan_concatenated_51

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386
1,1/1/2017 0:10,6.414,74.5,0.083,0.07,0.085,29814.68354,19375.07599,20131.08434
2,1/1/2017 0:20,6.313,74.5,0.08,0.062,0.1,29128.10127,19006.68693,19668.43373
3,1/1/2017 0:30,6.121,75.0,0.083,0.091,0.096,28228.86076,18361.09422,18899.27711
4,1/1/2017 0:40,5.921,75.7,0.081,0.048,0.085,27335.6962,17872.34043,18442.40964


#### Additional styling features
<p>
like cell_hover.
</p>

In [62]:
# df.style.format_index(axis=1, na_rep='MISS', precision=3)  
# https://betterdatascience.com/style-pandas-dataframes/
cell_hover = {
    "selector": "td:hover",
    "props": [("background-color", "#FFFFE0")]
}
index_names = {
    "selector": ".index_name",
    "props": "font-style: italic; color: darkgrey; font-weight:normal;"
}
headers = {
    "selector": "th:not(.index_name)",
    "props": "background-color: #800000; color: white;"
}

tetuan_concatenated_51 = \
tetuan_concatenated_51.set_table_styles([cell_hover, index_names, headers])


tetuan_concatenated_51 

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386
1,1/1/2017 0:10,6.414,74.5,0.083,0.07,0.085,29814.68354,19375.07599,20131.08434
2,1/1/2017 0:20,6.313,74.5,0.08,0.062,0.1,29128.10127,19006.68693,19668.43373
3,1/1/2017 0:30,6.121,75.0,0.083,0.091,0.096,28228.86076,18361.09422,18899.27711
4,1/1/2017 0:40,5.921,75.7,0.081,0.048,0.085,27335.6962,17872.34043,18442.40964


#### Styling object

In [63]:
type(tetuan_concatenated_51)

pandas.io.formats.style.Styler

In [64]:
tetuan_html_table_styled=tetuan_concatenated_51.to_html()

print(type(tetuan_html_table_styled))
print(tetuan_html_table_styled[2000:3000])

<class 'str'>
level0_col7" class="col_heading level0 col7" >Zone 2  Power Consumption</th>
      <th id="T_26609_level0_col8" class="col_heading level0 col8" >Zone 3  Power Consumption</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th id="T_26609_level0_row0" class="row_heading level0 row0" >0</th>
      <td id="T_26609_row0_col0" class="data row0 col0" >1/1/2017 0:00</td>
      <td id="T_26609_row0_col1" class="data row0 col1" >6.559000</td>
      <td id="T_26609_row0_col2" class="data row0 col2" >73.800000</td>
      <td id="T_26609_row0_col3" class="data row0 col3" >0.083000</td>
      <td id="T_26609_row0_col4" class="data row0 col4" >0.051000</td>
      <td id="T_26609_row0_col5" class="data row0 col5" >0.119000</td>
      <td id="T_26609_row0_col6" class="data row0 col6" >34055.696200</td>
      <td id="T_26609_row0_col7" class="data row0 col7" >16128.875380</td>
      <td id="T_26609_row0_col8" class="data row0 col8" >20240.963860</td>
    </tr>
    <tr>
      <th id="T_266

#### Writing file to disc

In [65]:
tetuan_concatenated_51.to_html('html/tetuan_html_table_styled.html')

#### Displaying an html file
<p>
in a Jupyter notebook cell with:<br>
from IPython.display import display, HTML
</p>

In [66]:
# Displaying HTML
display(HTML('<h3>Hello, world!</h3>'))

In [67]:
# from IPython.display import display, HTML
# This displays the HTML style stored in a variable from above
display(HTML(tetuan_html_table_styled))

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386
1,1/1/2017 0:10,6.414,74.5,0.083,0.07,0.085,29814.68354,19375.07599,20131.08434
2,1/1/2017 0:20,6.313,74.5,0.08,0.062,0.1,29128.10127,19006.68693,19668.43373
3,1/1/2017 0:30,6.121,75.0,0.083,0.091,0.096,28228.86076,18361.09422,18899.27711
4,1/1/2017 0:40,5.921,75.7,0.081,0.048,0.085,27335.6962,17872.34043,18442.40964


#### Reading the HTML-file from disc
<p>
and display it.
</p>

In [68]:
with open('html/tetuan_html_table_styled.html', 'r') as file:
    tetuan_html_table_styled_disc=HTML(file.read())
    file.close()

In [69]:
tetuan_html_table_styled_disc

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386
1,1/1/2017 0:10,6.414,74.5,0.083,0.07,0.085,29814.68354,19375.07599,20131.08434
2,1/1/2017 0:20,6.313,74.5,0.08,0.062,0.1,29128.10127,19006.68693,19668.43373
3,1/1/2017 0:30,6.121,75.0,0.083,0.091,0.096,28228.86076,18361.09422,18899.27711
4,1/1/2017 0:40,5.921,75.7,0.081,0.048,0.085,27335.6962,17872.34043,18442.40964


### Uploading HTML-files to AWS S3

<p>
by defining the ContentType in ExtraArgs.
</p>

In [70]:
htmlfiles_5555 = s3.create_bucket(Bucket='htmlfiles45324')

In [71]:
s3.upload_file(Filename='html/tetuan_html_table_styled.html',
               Bucket='htmlfiles45324',
               Key='tetuan_html_table_styled.html',
               ExtraArgs = {
               'ContentType': 'text/hmtl',
               'ACL':'public-read'})

#### Displaying the file as an html page

In [72]:
# https://{bucket}.{key}
html_page = \
"https://{}.S3.amazonaws.com/{}".format('htmlfiles45324', 
                                        'tetuan_html_table_styled.html')
                                 

print(html_page)

https://htmlfiles45324.S3.amazonaws.com/tetuan_html_table_styled.html


### Creating an index html page
<p>
or starting, main page of a website.
</p>

<p>
In preparation html files are created.<br>
Later these html files are linked from the index page as sub pages.
</p>
    

In [73]:
tetuan_1 = pd.read_csv("csv/part_tetuan_1.csv", index_col=0, nrows=20)
tetuan_1.to_html("html/tetuan_1.html")
tetuan_2 = pd.read_csv("csv/part_tetuan_2.csv", index_col=0, nrows=20)
tetuan_2.to_html("html/tetuan_2.html")
tetuan_3 = pd.read_csv("csv/part_tetuan_3.csv", index_col=0, nrows=20)
tetuan_3.to_html("html/tetuan_3.html")
tetuan_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20 entries, 0 to 19
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   DateTime                   20 non-null     object 
 1   Temperature                20 non-null     float64
 2   Humidity                   20 non-null     float64
 3   Wind Speed                 20 non-null     float64
 4   general diffuse flows      20 non-null     float64
 5   diffuse flows              20 non-null     float64
 6   Zone 1 Power Consumption   20 non-null     float64
 7   Zone 2  Power Consumption  20 non-null     float64
 8   Zone 3  Power Consumption  20 non-null     float64
dtypes: float64(8), object(1)
memory usage: 1.6+ KB


In [74]:
tetuan_1.head(1)

Unnamed: 0,DateTime,Temperature,Humidity,Wind Speed,general diffuse flows,diffuse flows,Zone 1 Power Consumption,Zone 2 Power Consumption,Zone 3 Power Consumption
0,1/1/2017 0:00,6.559,73.8,0.083,0.051,0.119,34055.6962,16128.87538,20240.96386


<p>
Uploading the html pages into the bucket.
</p>

In [75]:
for html_file in ["tetuan_1", "tetuan_2", "tetuan_3"]:

        s3.upload_file(
        Filename= "html/{}.html".format(html_file),
        Key="{}.html".format(html_file),
        Bucket='htmlfiles45324',
        ExtraArgs = {
        'ContentType': 'text/html',
        'ACL': 'public-read'
        })


In [76]:
# r = s3.list_objects(Bucket='gid-reports', Prefix='2019/')
obj_for_index = s3.list_objects(Bucket='htmlfiles45324')

obj_for_index = pd.DataFrame(obj_for_index['Contents'])

obj_for_index.info()

obj_for_index.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype                  
---  ------        --------------  -----                  
 0   Key           8 non-null      object                 
 1   LastModified  8 non-null      datetime64[ns, tzutc()]
 2   ETag          8 non-null      object                 
 3   Size          8 non-null      int64                  
 4   StorageClass  8 non-null      object                 
 5   Owner         8 non-null      object                 
dtypes: datetime64[ns, tzutc()](1), int64(1), object(4)
memory usage: 512.0+ bytes


Unnamed: 0,Key,LastModified,ETag,Size,StorageClass,Owner
0,html_tetuan_1.html,2022-10-26 11:03:53+00:00,"""fe3d69dc1476051eab4fda7c81b72a34""",5448,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63..."
1,html_tetuan_2.html,2022-10-26 11:03:54+00:00,"""be59acf6baf45f81abd3d798c7ba7df5""",5546,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63..."
2,html_tetuan_3.html,2022-10-26 11:03:54+00:00,"""7ca11b3d5a1b69da641bb359d8c2272f""",5601,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63..."
3,index.html,2022-11-03 10:45:47+00:00,"""0b7158b1b45ad5f5254e24c08767d70b""",2361,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63..."
4,tetuan_1.html,2022-11-03 12:10:02+00:00,"""fe3d69dc1476051eab4fda7c81b72a34""",5448,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63..."


In [77]:
# base_url = "http://datacamp-website."
# objects_df['Link'] = base_url + objects_df['Key']

base_url = "http://htmlfiles45324.S3.amazonaws.com/"
obj_for_index['Hyperlink'] = base_url + obj_for_index['Key']

obj_for_index.head()

Unnamed: 0,Key,LastModified,ETag,Size,StorageClass,Owner,Hyperlink
0,html_tetuan_1.html,2022-10-26 11:03:53+00:00,"""fe3d69dc1476051eab4fda7c81b72a34""",5448,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63...",http://htmlfiles45324.S3.amazonaws.com/html_te...
1,html_tetuan_2.html,2022-10-26 11:03:54+00:00,"""be59acf6baf45f81abd3d798c7ba7df5""",5546,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63...",http://htmlfiles45324.S3.amazonaws.com/html_te...
2,html_tetuan_3.html,2022-10-26 11:03:54+00:00,"""7ca11b3d5a1b69da641bb359d8c2272f""",5601,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63...",http://htmlfiles45324.S3.amazonaws.com/html_te...
3,index.html,2022-11-03 10:45:47+00:00,"""0b7158b1b45ad5f5254e24c08767d70b""",2361,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63...",http://htmlfiles45324.S3.amazonaws.com/index.html
4,tetuan_1.html,2022-11-03 12:10:02+00:00,"""fe3d69dc1476051eab4fda7c81b72a34""",5448,STANDARD,"{'DisplayName': 'rolf.chung', 'ID': '4fc35fc63...",http://htmlfiles45324.S3.amazonaws.com/tetuan_...


In [78]:
# Write DataFrame to html

obj_for_index.to_html('html/tetuan_index.html', 
                      columns=['Key', 'LastModified', 'Hyperlink'], render_links=True)

In [79]:
obj_for_index[['Hyperlink']]



Unnamed: 0,Hyperlink
0,http://htmlfiles45324.S3.amazonaws.com/html_te...
1,http://htmlfiles45324.S3.amazonaws.com/html_te...
2,http://htmlfiles45324.S3.amazonaws.com/html_te...
3,http://htmlfiles45324.S3.amazonaws.com/index.html
4,http://htmlfiles45324.S3.amazonaws.com/tetuan_...
5,http://htmlfiles45324.S3.amazonaws.com/tetuan_...
6,http://htmlfiles45324.S3.amazonaws.com/tetuan_...
7,http://htmlfiles45324.S3.amazonaws.com/tetuan_...


In [80]:
s3.upload_file(
Filename='html/tetuan_index.html',
Key='index.html',
Bucket='htmlfiles45324',
ExtraArgs = {
'ContentType': 'text/html',
'ACL': 'public-read'
})



In [81]:
"http://htmlfiles45324.s3.amazonaws.com/index.html"


'http://htmlfiles45324.s3.amazonaws.com/index.html'