**Course**: Data Visualization (Prof. Dr. Heike Leitte, Luisa Vollmer, RPTU Kaiserslautern),   **Name**: Faris Abu Ali,   **Date**: 09.11.2024

<div class="alert alert-info">
    
# Assignment 1 - Visualizing Data
</div>

The **goals** of the first assignment are:
- Get familiar with python programming in the jupyter notebook;
- Be able to create a data visualization using bokeh;
- Recreate an existing visualization and develop an eye for key features;
- Start critical thinking about design options;



To achieve these goals, your task is to create a visualization of the weather in Kaiserslautern in 2018. The visualization should be similar to the following chart from the New York Times (Jan. 11, 1981, p. 32; Tufte (1983), p. 30) and needs to be implemented in bokeh+pandas:

![New York city's weather for 1980 from the New York Times](http://euclid.psych.yorku.ca/SCS/Gallery/images/NYweather.jpg)


<div class="alert alert-danger">

**Important**: While no points will be awarded for typing the correct answers in the notebooks, it is highly advised to solve the tasks thoroughly. They are designed to be encouraging and provide you with valuable learnings for the exam, understanding of the methods and practical coding. It is mandatory to upload a jupyter notebook with your best attempt in the questionnaire. Uploading the notebook without any changes to the code is not sufficient. 
</div>

<div class="alert alert-success">
    
All tasks in this notebook are marked in green.
</div>

<div class="alert alert-info">
    
## 1. Starter Code - Minimal working example
</div>

The following pieces of code load the data for this assignment and generate a minimal chart for the temperature data. More details can be found in the [bokeh documentation](https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html).

First load all necessary python modules:

In [2]:
# Install pandas
!pip install pandas

# Install bokeh
!pip install bokeh




In [3]:
import pandas as pd

from bokeh.plotting import figure, output_notebook, show, output_file
from bokeh.models import Band, ColumnDataSource, PrintfTickFormatter, DatetimeTickFormatter, Label
from bokeh.layouts import column
from bokeh.models.tickers import MonthsTicker

# - Configure the default output state to generate output in notebook cells when the function `show` is called. 
# - Note that `show`` may be called multiple times in a single cell to display multiple objects in the output cell.
# - The objects will be displayed in order.
output_notebook()

Load the data given in csv-file format using the pandas library and display the first lines of the data table.

In [4]:
df_kl = pd.read_csv('KLweather2018.csv', parse_dates=['Timestamp'], index_col='Timestamp')
df_kl_prec = pd.read_csv('KLweather2018_monthlyPrecipitation.csv', parse_dates=['Timestamp'], index_col='Timestamp')

df_kl.head()

Unnamed: 0_level_0,temp_min,temp_max,temp_normal_min,temp_normal_max,rel_humidity
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-01,4.8,9.3,-0.031034,4.875862,79.75
2018-01-01,4.8,9.3,-0.031034,4.875862,79.75
2018-01-02,4.3,6.4,-0.3,4.996552,83.58
2018-01-03,5.4,10.7,0.310345,5.182759,83.46
2018-01-04,4.8,12.4,-0.351724,5.027586,90.5


Plot the temperature minimum as a line chart with bokeh using default settings. 

In [5]:
# create a figure
p = figure(height=400, x_axis_type="datetime")

# define the type of glyph that is rendered and its data. here: a polyline
p.line(source=df_kl, x='Timestamp', y='temp_min')

# render the chart
show(p)

<div class="alert alert-info">
    
## 2. Customizing the temperature chart
</div>

As detailed above, your visualization should look like a modern version of the one from the New York Times. This can be achieved by changing the graphical elements and styling visual properties. In the function below some elements are already changed. Update the code to make the temperature chart even more similar:

<div class="alert alert-success">
    
- Depict the normal high and low temperatures as polylines.
- Label the two polylines. You may use the legend functionality.
- Depict the daily temperature range as an area.
- Label the x-axis.
- Style visual attributes (color, line style) to your liking.
    
</div>

Helpful ressources:
- [Plotting with basic glyphs](https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html) - Overview of glyph types that are implemented in bokeh; see the examples for all the graphical primitives that can be plotted directly.
- [Styling visual attributes](https://docs.bokeh.org/en/latest/docs/user_guide/styling.html) - See styling options for chart elements

In [6]:
def create_temperature_chart(df, width=900):
    '''Create a bokeh figure for temperature range and normal values.'''
    
    # create figure and data source
    p = figure(width=width, height=400, title='Kaiserslautern\'s Weather for 2018', tools=['xwheel_zoom', 'save'], 
           x_axis_type="datetime", x_axis_location="above", y_range=(-15,40))
    
    # output_file("KLweather2018-Faris-Abu-Ali.html") # Save the chart as an HTML file

    # create a ColumnDataSource to hold the data. See https://docs.bokeh.org/en/latest/docs/user_guide/basic/data.html#ug-basic-data-cds
    source = ColumnDataSource(df)

    # add graphical items
    p.line(source=source, x='Timestamp', y='temp_max')
    p.line(source=source, x='Timestamp', y='temp_min')

    # Depict the normal high and low temperatures as polylines.
    # Label the two polylines. I will use the legend_label attribute for this.
    for y in ['temp_normal_min', 'temp_normal_max']:
        p.line(source=source, x='Timestamp', y=y, color='grey', legend_label='Normal low and high')

    # Depict the daily temperature range as an area.
    p.varea(source=source, x='Timestamp', y1='temp_min', y2='temp_max', fill_alpha=0.5, legend_label='2018')
    # varea means vertical area

    # style visual attributes
    p.xaxis.ticker = MonthsTicker(months=list(range(12))) 
    p.xgrid.ticker = MonthsTicker(months=list(range(12))) 
    p.xaxis.formatter=DatetimeTickFormatter(months="               %b")
    p.xaxis.major_label_text_align = 'right'
    p.yaxis[0].formatter = PrintfTickFormatter(format="%2i¬∞")
    p.yaxis.axis_label = "Temperature [¬∞C]"
    p.title.text_font_size = "15pt"
    p.title.align = "center"
    
    return p

p = create_temperature_chart(df_kl)
show(p)

<div class="alert alert-info">
    
## 3. Filtering data and making annotations
</div>

The following piece of code demonstrates how to find maxima in a data column. Use this code to automatically find the highest and lowest temperature values in 2018 and place a mark in the chart above at these positions (e.g. circle the respective data points).

<div class="alert alert-success">
    
- Automatically filter the highest and lowest temperatures in Kaiserslautern in 2018.
- Integrate the code in the chart computation method above and mark the two detected positions.
- Add text labels to the positions. [Label documentation](https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#labels) for bokeh.
    
</div>

In [7]:
from bokeh.models import Label, LabelSet

# a decorator to add high and low temperature labels to the chart
def with_high_low_labels(chart_func):
    def wrapper(df, *args, **kwargs):
        # Call the original chart creation function
        p = chart_func(df, *args, **kwargs)

        color_map = {'temp_max': '#e74c3c', 'temp_min': '#0abde3'}
        
        # Add high/low temperature labels using label_high_low_temperatures
        tmax_id = df['temp_max'].idxmax()
        tmin_id = df['temp_min'].idxmin()

        print("KL temperature minimum:", tmin_id, df_kl.at[tmin_id,'temp_min'])
        print("KL temperature maximum:", tmax_id, df_kl.at[tmax_id,'temp_max'])
        
        # Mark the two days with the highest and lowest temperatures in 2018.
        p.scatter(x=[tmax_id, tmin_id], y=[df_kl.at[tmax_id,'temp_max'], df_kl.at[tmin_id,'temp_min']], size=5, color=[color_map['temp_max'], color_map['temp_min']])
        
        # ‚ö†Ô∏è BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
        # p.circle(x=[tmax_id, tmin_id], y=[df_kl.at[tmax_id,'temp_max'], df_kl.at[tmin_id,'temp_min']], size=5, color=['orange', 'steelblue'])

        # Add text labels to the two points.
        label_max = Label(x=tmax_id, y=df_kl.at[tmax_id,'temp_max'], text='Max', text_font_size='10pt', text_color=color_map['temp_max'], x_offset=5, y_offset=5)
        label_min = Label(x=tmin_id, y=df_kl.at[tmin_id,'temp_min'], text='Min', text_font_size='10pt', text_color=color_map['temp_min'], x_offset=5, y_offset=5)
        
        p.add_layout(label_min)
        p.add_layout(label_max)


        # === Using LabelSet ===
        # ü•≤ but using LabelSet doesn't give me freedom to use different text_color for each label.
        # source = ColumnDataSource(data=dict(x=[tmin_id, tmax_id],
        #                                     y=[df_kl.at[tmin_id,'temp_min'], df_kl.at[tmax_id,'temp_max']],
        #                                     names=['Min', 'Max']))
        # labels = LabelSet(x='x', y='y', text='names', level='glyph',x_offset=5, y_offset=5, source=source, text_font_size='10pt', text_color='grey',)
        # p.add_layout(labels)
        return p
    
    return wrapper

# Wrapper function that conditionally adds labels
def create_temperature_chart_customized(df,add_labels=True, width=900):
    '''Create a temperature chart with optional high/low temperature labels.'''
    if add_labels:
        # Apply the decorator to add labels
        decorated_chart_func = with_high_low_labels(create_temperature_chart)
        return decorated_chart_func(df, width=width)
    else:
        # Call the original function without labels
        return create_temperature_chart(df, width=width)


# Integrate the code in the chart computation method above and mark the two detected positions.
p = create_temperature_chart_customized(df_kl)


show(p)

KL temperature minimum: 2018-02-28 00:00:00 -14.0
KL temperature maximum: 2018-08-04 00:00:00 35.5


<div class="alert alert-info">

## 4. Designing additional charts
</div>

Now design the charts for precipitation and relative humidity.

<div class="alert alert-success">
    
- Create the chart for precipitation. Design a bar chart using the hints below.
- Create the chart for humidity.
    
</div>

Hints for temporal x-axis:
- **Width of bars**: The width is given milliseconds. In order to get the required scaling, you will need to specify the width like: `widthInDays = ndays*24*60*60*1000` (24 hours * 60 minutes * 60 seconds * 1000 milliseconds)
- **Position of bars**: You can shift the bars using the dodge function `x=dodge('prec', value, range=p.x_range)`. Keep in mind that you need to define an appropriate `value` by which to shift the bar.

In [8]:
def create_precipitation_chart(df, width=900):
    '''Create a bokeh figure for monthly precipitation (2018 vs normal values).'''
    from bokeh.transform import dodge
    
    p = figure(width=width, height=200, tools=['xwheel_zoom', 'save'], x_axis_type="datetime", title='Monthly Precipitation in Kaiserslautern 2018')
    p.title.text_font_size = "12pt"
    p.title.align = "center"
    p.xaxis.ticker = MonthsTicker(months=list(range(12)))
    p.xgrid.ticker = MonthsTicker(months=list(range(12)))
    p.xaxis.formatter=DatetimeTickFormatter(months="%b")
    p.yaxis.axis_label = "Precipitation [mm]"

    # p.line(source=df, x='Timestamp', y='prec')

    width_in_days = 28*24*60*60*1000 # 28 days represented in milliseconds
    # Since each row in the DataFrame was recorded in the middle of the month, we need to dodge the bars to the right.
    x = dodge("Timestamp", width_in_days/2, range=p.x_range)
    p.vbar(source=df, x=x, top='prec', width=width_in_days, fill_alpha=0.5)


    return p

show(create_precipitation_chart(df_kl_prec))


In [9]:
def create_humidity_chart(df, width=900):
    '''Create a bokeh figure for relative humidity.'''
    
    p = figure(width=width, height=200, tools=['xwheel_zoom', 'save'], x_axis_type="datetime", title='Relative Humidity in Kaiserslautern 2018')
    p.title.text_font_size = "12pt"
    p.title.align = "center"
    p.xaxis.ticker = MonthsTicker(months=list(range(12)))
    p.xaxis.formatter=DatetimeTickFormatter(months="%b")
    p.yaxis.axis_label = "Relative Humidity [%]"

    # p.line(source=df_kl, x='Timestamp', y='rel_humidity')

    p.vbar(source=df, x='Timestamp', top='rel_humidity', fill_alpha=0.5)

    return p

show(create_humidity_chart(df_kl))

<div class="alert alert-info">
    
## 5. Combining multiple charts
</div>

In this last part, we combine the three charts you designed above.

<div class="alert alert-success">
    
- Create the combined weather chart for Kaiserslautern.
- Save a jpg/png-version or screenshot of this chart that can be uploaded in OLAT.
    
</div>

In [11]:
from bokeh.layouts import row, column, Spacer

padding = 20 # padding between the charts in pixels

show(column(
    children=[
        create_temperature_chart_customized(df_kl), 
        Spacer(height=padding),
        create_precipitation_chart(df_kl_prec),
        Spacer(height=padding),
        create_humidity_chart(df_kl)
    ],
    sizing_mode='scale_height',
))



KL temperature minimum: 2018-02-28 00:00:00 -14.0
KL temperature maximum: 2018-08-04 00:00:00 35.5
