<div style="font-size: 24px; font-weight: bold; margin-bottom: 10px; text-decoration: underline;">From unknown data to profession quality graphs</div>

# Subject
> As a data scientist, you are tasked by a newspaper to prepare 3 striking visuals to support an article about youtube 2023 statistics.

**Goal:** During this lab, you will prepare 3 general public data visualization **with** interpretation/story to create a clear and efficient communication tool.

**Topic:** You explore in depth a dataset and design great visuals.

<div class="alert alert-info" role="alert">
  <strong>Find your own dataset (30 minutes max) and present your choice to the instructor before going ahead.</strong>

  <p>
    Browse the internet to find an interesting dataset containing lots of data (> 1KB). For instance, you can browse <a href="https://github.com/awesomedata/awesome-public-datasets" target="_blank" rel="noopener">
      this github repo
    </a>
  </p>
    Or any open/NGO/goverment body dataset.
  <ul>
    <li>
      <a href="https://www.data.gouv.fr/datasets/carte-des-loyers-indicateurs-de-loyers-dannonce-par-commune-en-2025" target="_blank" rel="noopener">
        French Government rental price index
      </a>
    </li>
    <li>
      <a href="https://www.data.gouv.fr/datasets/population" target="_blank" rel="noopener">
        French population dataset
      </a>
    </li>
  </ul>
</div>

**Software:** You will explore data using pandas and plotting tools in this jupyter notebook. For the final version of your graphs, you can use any software you deem fit.

**Duration in class:** 6h (exploration of the data and hand design of the visuals) +2h full class presentation of your work. 

**In this Notebook/Report:**
- Present the dataset and your understanding of it
- Explain what data analysis/exploration you have done. Feel free to manipulate the data and extract/compute information from the set.
- Show inspiration sources you have used (if any)
- Show your draft version(s), including a hand-drawn graph layout for each of your graphs
- Write a storyboard (not a story)
- Describe building steps of each of your visuals
- Save each of your visuals in a separate file for quick review.

**Graph types:**
- One graph has to be a map if the data contains geographical data
- One has to be picked in this list of less common graphs (violin plot, ridge line plot, sankey diagram, heatmap, radial plot, dumbbel plot or scatter with 3 or 4 variables encode)
- Free graph type: check your choice with instructor early on

<div class="alert alert-danger" role="alert">
 Pie charts and simple bar charts / line charts are strictly <b>forbidden</b>. 
</div>

Each type has to deal with a different aspect of the dataset. It is a plus if they form a coherent set.

**Prepare** a storyboard for each graph to be presented to the instructor before the end of Thursday session.

**Remarks:** The visuals may contain subplots, text or any visual cue that will help the reader understand your point!
 
**Grading:**

- **groups of 2 only**
- this has to be original work
- 5/20 is based on your presentation.
- 30% of the total grade

<div class="alert alert-warning" role="alert">
   <b> General remarks about previous years : </b> 
</div>

- Lack of hindsight and understanding of what they were doing e.g. unbelievable values are taken for ground truth. 
- General misinterpretation of the storytelling. No need to write a press article. Just make sure your title (and subtitle) make the graph meaning obvious! 
- Too many simple graphs without proper packaging

# Data exploration

<div class="alert alert-info" role="alert">
    <b> Context: </b> Your first task is to explore the dataset with python libraries starting with pandas and data viz tools such as matplotlib, plotly, seaborn... Feel free to use online ressources for inspiration and code snipets. You will have to present your work, so make sure you understand what you are doing.
    <p>The datasets are not perfect. Make sure the data you're plotting is clean.</p>
</div>

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import numpy as np

In [3]:
geodata=gpd.read_file("final_lab_2025_data/POSTFIRE_MASTER_DATA_SHARE_140463065990229786.geojson")

ERROR 1: PROJ: proj_create_from_database: Open of /home/llestand/miniforge3/envs/teaching/share/proj failed


In [4]:
geodata

Unnamed: 0,OBJECTID,DAMAGE,STREETNUMBER,STREETNAME,STREETTYPE,STREETSUFFIX,CITY,STATE,ZIPCODE,CALFIREUNIT,...,UTILITYMISCSTRUCTUREDISTANCE,FIRENAME,APN,ASSESSEDIMPROVEDVALUE,YEARBUILT,SITEADDRESS,GLOBALID,Latitude,Longitude,geometry
0,1,No Damage,8376.0,Quail Canyon,Road,,Winters,CA,,LNU,...,,Quail,0101090290,510000.0,1997.0,8376 QUAIL CANYON RD VACAVILLE CA 95688,e1919a06-b4c6-476d-99e5-f0b45b070de8,38.474960,-122.044465,POINT (-13585927.697 4646740.75)
1,2,Affected (1-9%),8402.0,Quail Canyon,Road,,Winters,CA,,LNU,...,,Quail,0101090270,573052.0,1980.0,8402 QUAIL CANYON RD VACAVILLE CA 95688,b090eeb6-5b18-421e-9723-af7c9144587c,38.477442,-122.043252,POINT (-13585792.707 4647093.599)
2,3,No Damage,8430.0,Quail Canyon,Road,,Winters,CA,,LNU,...,,Quail,0101090310,350151.0,2004.0,8430 QUAIL CANYON RD VACAVILLE CA 95688,268da70b-753f-46aa-8fb1-327099337395,38.479358,-122.044585,POINT (-13585941.007 4647366.034)
3,4,No Damage,3838.0,Putah Creek,Road,,Winters,CA,,LNU,...,,Quail,0103010240,134880.0,1981.0,3838 PUTAH CREEK RD WINTERS CA 95694,64d4a278-5ee9-414a-8bf4-247c5b5c60f9,38.487313,-122.015115,POINT (-13582660.52 4648497.399)
4,5,No Damage,3830.0,Putah Creek,Road,,Winters,CA,,LNU,...,,Quail,0103010220,346648.0,1980.0,3830 PUTAH CREEK RD WINTERS CA 95694,1b44b214-01fd-4f06-b764-eb42a1ec93d7,38.485636,-122.016122,POINT (-13582772.601 4648258.826)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100225,101217,No Damage,24198.0,Case,Court,blding A,Malibu,CA,90265.0,LAC,...,,,4458018039,2249521.0,0.0,"24198 CASE CT, MALIBU, CA 90265",1b537de2-8b97-43ee-9d5c-c5a426f604f1,34.033408,-118.700625,POINT (-13213693.085 4033288.74)
100226,101218,No Damage,24198.0,Case,Court,blding B,Malibu,CA,90265.0,LAC,...,,,4458018039,2249521.0,0.0,"24198 CASE CT, MALIBU, CA 90265",56e3cc8b-4bf0-4beb-bd26-b54422cc31ee,34.033278,-118.700902,POINT (-13213723.924 4033271.342)
100227,101219,No Damage,24198.0,Case,Court,blding C,Malibu,CA,90265.0,LAC,...,,,4458018039,2249521.0,0.0,"24198 CASE CT, MALIBU, CA 90265",f85e9f02-a67f-4a7d-9fa1-0b8bf419d51a,34.033618,-118.701102,POINT (-13213746.22 4033317.044)
100228,101220,No Damage,24008.0,Malibu,Road,,Malibu,CA,90265.0,LAC,...,,,4458009014,5983875.0,2016.0,"24008 MALIBU RD, MALIBU, CA 90265",51b2df1f-852e-4f36-b250-b383c93e4042,34.032085,-118.698270,POINT (-13213431.001 4033111.003)


In [None]:
#loading data with geopandas
df=pd.read_csv("Your_CSV_file.csv",encoding = 'ISO-8859-1')

In [None]:
df.info()

# Storyboard and hand-drawn blueprint

<div class="alert alert-info" role="alert">
    <b> In this section : </b> 
you explain what you have observed in the dataset and how you plan to communicate these findings to a general audience. You will provide a drawing of your expected graph. Handover your sketch on hippocampus by the end of the first session.
</div>

# Building the final graphs.

<div class="alert alert-info" role="alert">
    <b> Publication ready graph</b> 
The graphs have to be self contained figures. By reading them, the audience should know what they are looking at, how to interpret it and be able to comfort their understanding with the data shown.

Your work will be presented to the class on 12th Feb. 5 mins per group.
</div>