<h1> <center> GEOG 172: INTERMEDIATE GEOGRAPHICAL ANALYSIS </h1>
    <h2> <center> Evgeny Noi </h2>
        <h3> <center> Lecture 05: GeoVisualization </h3>

# GeoVisualization 

* Process of interactively visualizing geographic information in any of the steps in spatial analyses, 
* Originated in cartography, geovis was pioneered under the ledership of Alan MacEachren (Penn State), who developed tools and methods for interactive exploratory data analysis. 
* A core argument for geovisualization is that visual thinking using maps is integral to the scientific process and hypothesis generation, and the role of maps grew beyond communicating the end results of an analysis or documentation process. 
* Geovis interacts with cartography, visual analytics, information visualization, scientific visualization, statistics, computer science, art-and-design, and cognitive science. 

*Source: https://gistbok.ucgis.org/bok-topics/geovisualization* 

|||
|---|---|
|<img src="https://gistbok.ucgis.org/sites/default/files/CV35-Fig1b_350.png">|<img src="https://gistbok.ucgis.org/sites/default/files/CV35-Fig1a-350v.png">|
|Cartography Cube|Swoopy Framework|

# What does geoviz look like in practice? 

<img src="https://gistbok.ucgis.org/sites/default/files/CV35-Fig3_0.png">

# Skills for GeoVisualization 

* GUI-based tools and scripting languages (Processing, Python, D3JS)
* Data processing skills
* Analytical skills (statistics)
* Visualization skills

# Typical Geovisualization Workflow 

* Raw data contains more information than we can visualize effectively 
* Thus, good understanding of visulization techniques is required 
    * cluttered displays, overplotting
    * --> aggregate data, highlighting interesting aspects of data
    * if visualization is interative a **user** can explore the data and find **interesting** 
* visual displays 
* visual variables (Bertin) 
* UI and UX design 

# Typical Data Analysis (Lab) Workflow 

* **Find the data (Cal Fire)**
* **Pre-process the data: import, inspect visually, look at missing values, filter, drop.**
* **Exploratory Data Analysis (GEOVIZ). Generate visual and pose interesting questions about data (descriptive statistics)**
* Data Analysis (inferential statistics) 
* Present Findings

# Data Analysis Example - California Wildfires 

1. Finding data 
    1. Calfire publishes data in .gdb format, which is not straight-forward to analyze in Python (What software can we use to analyze it?) 
    2. Use terminal command 'wget' to download the data via URL from the internet 
    3. Find and download mannually ([try it now](https://gis.data.ca.gov/maps/CALFIRE-Forestry::california-fire-perimeters-all-1))
    4. Host on web services that provide readable url links and read via Pandas 'read_csv' function

# Brainstorm Exercise

> Working with your table, illicit interesting questions about wildfires in CA

> Think about variables that are already in the data set 
 
> What new variables can we generate that will help us answer the questions 

In [2]:
# copied form the lab 

# Data Analysis Example - California Wildfires 

2. Pre-process the data 
    1. remove variables that you will not be using (if necessary) 
    2. subset the data to your study area 
    3. convert variables for calculation (float --> int; string --> datetime; etc.) 

In [None]:
# subset the data to only include one year 
fires2021 = fires.loc[fires['YEAR_']==2021]
print(fires2021.shape) 

# convert data types 
fires2021['ALARM_DATE'] = pd.to_datetime(fires2021.ALARM_DATE)
fires2021['CONT_DATE'] = pd.to_datetime(fires2021.CONT_DATE)

# calculate duration
fires2021['dur_days'] = (fires2021.CONT_DATE - fires2021.ALARM_DATE).dt.days

# get the month start of fires 
fires2021['month_started'] = fires2021.ALARM_DATE.dt.month

# Exploratory Data Analysis

1. Think about how numbers (statistics), non-spatial plots and geographic maps can help answer the questions you posed in the previous exercise? 

> useful Python functions: groupby(), reset_index(), set_index(); mean(), median(), sum(); plot(); 

In [None]:
# find average dur of fires 
print('average duration of fires in 2021:' fires2021.dur_days.mean())

# which month has the most fire starts 
fires2021.month_started.value_counts().plot(kind='barh')

In [None]:
# average number of fires per each month 
fires_per_month = fires2021.groupby('month')['INC_NUM'].sum().reset_index()
fires_per_month.set_index("month", inplace=True)
fires_per_month.plot()