
### Recap of Data, Goals, and Tasks:
__Data:__ The dataset I'm working with comprises historical weather data and climbing records from Mount Rainier. The weather data includes attributes such as date, average temperature, relative humidity, wind speed, wind direction, and solar radiation. On the other hand, the climbing data includes the date of the climbing record, the route attempted, the number of attempts, the number of successful attempts, and the success percentage.

__Goals:__ My goal for working with this data is to analyze and visualize summit success based on the weather details on a given day. 

__Tasks:__
1.	Explore correlation between all climbing and weather variables
2.	Correlate weather conditions with success rates
3.	Explore trends in weather conditions over time
***

### Sketching your Data
__Task 1: Visualize Correlation Between All Climbing and Weather Variables__
- Goal: Understand how weather variables correlate with each other and summit success to identify key factors.
- Means: Create correlation matrix to quantify the relationship between weather variables (temperature, humidity, wind speed, solar radiation) and summit success.
- Characteristics: Determine the strength and direction of relationships between weather conditions and summit success rates..
- Target Data: Historical weather data and climbing records.
- Workflow: This task is performed during the exploratory data analysis phase.
- Roles: Data analysts, researchers, or climbers interested in weather's impact on climbing success.
Lo-fi prototype:

![Low-Fi Correlation Matrix](https://github.com/duba1910/msds/raw/main/Vital%20Skills/5304%20Visualization/final%20project/low_fi%20correlation%20matrix.jpg)
__Task 2: Visualize Trends in Weather Conditions over Time__
- Goal: Identify temporal patterns and trends in weather conditions to understand seasonal variations and potential climatic changes.
- Means: Create line charts or time series plots to visualize changes in temperature, humidity, wind speed, and solar radiation over time.
- Characteristics: This task seeks to learn about long-term trends, cyclical patterns, and any anomalies or outliers in the weather data.
- Target Data: Historical weather data.
- Workflow: This task is performed during the exploratory data analysis phase or when monitoring weather trends over specific time periods.
- Roles: Climatologists, meteorologists, or researchers studying the climate surrounding Mt. Rainer.
Lo-fi prototype:

![low_fi line chart.jpg](https://github.com/duba1910/msds/raw/main/Vital%20Skills/5304%20Visualization/final%20project/low_fi%20line%20chart.jpg)
__Task 3: Identify Optimal Weather Conditions for Climbing__
- Goal: Determine the ideal weather conditions for maximizing summit success rates.
- Means: Analyze historical climbing records to identify weather conditions associated with the highest success rates.
- Characteristics: This task seeks to learn about the specific thresholds or ranges of weather variables (e.g., temperature, wind speed) that are conducive to successful climbs.
- Target Data: Historical weather data and climbing records.
- Workflow: This task involves data analysis and visualization to identify patterns and trends in summit success rates under various weather conditions.
- Roles: Climbers, expedition leaders, or guide services interested in optimizing climbing strategies based on weather forecasts and historical data.
Lo-fi prototype:

![low_fi success and weather chart.jpg](https://github.com/duba1910/msds/raw/main/Vital%20Skills/5304%20Visualization/final%20project/low_fi%20success%20and%20weather.jpg)
***


### EDA (Exploratory Data Analysis)

Setup

In [1]:
import pandas as pd
import altair as alt
import altair_viewer 
import requests
from io import StringIO
import seaborn as sns
import matplotlib.pyplot as plt


Load Data from my Github (data originally found on [Kaggle](https://www.kaggle.com/datasets/codersree/mount-rainier-weather-and-climbing-data) )

In [2]:

weather_url = "https://raw.githubusercontent.com/duba1910/msds/main/Vital%20Skills/5304%20Visualization/final%20project/Rainier_Weather.csv"
climbing_url = "https://raw.githubusercontent.com/duba1910/msds/main/Vital%20Skills/5304%20Visualization/final%20project/climbing_statistics.csv"


response_weather = requests.get(weather_url)
data_weather = response_weather.content.decode('utf-8')
weather_data = pd.read_csv(StringIO(data_weather))

response_climbing = requests.get(climbing_url)
data_climbing = response_climbing.content.decode('utf-8')
climbing_data = pd.read_csv(StringIO(data_climbing))

# Check the data
climbing_data.info()
weather_data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4077 entries, 0 to 4076
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Date                4077 non-null   object 
 1   Route               4077 non-null   object 
 2   Attempted           4077 non-null   int64  
 3   Succeeded           4077 non-null   int64  
 4   Success Percentage  4077 non-null   float64
dtypes: float64(1), int64(2), object(2)
memory usage: 159.4+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 464 entries, 0 to 463
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Date                   464 non-null    object 
 1   Battery Voltage AVG    464 non-null    float64
 2   Temperature AVG        464 non-null    float64
 3   Relative Humidity AVG  464 non-null    float64
 4   Wind Speed Daily AVG   464 non-null    float64
 5   Wind Direction

Make required datatype changes

In [3]:
climbing_data['Date'] = pd.to_datetime(climbing_data['Date'])
weather_data['Date'] = pd.to_datetime(weather_data['Date'])

Filter to show only dates in 2015 as the 2014 data is very sparse

In [4]:
climbing_data = climbing_data[climbing_data['Date'].dt.year == 2015]
weather_data = weather_data[weather_data['Date'].dt.year == 2015]

Join the climbing data with the weather data

In [5]:
joined_data = pd.merge(climbing_data, weather_data, how="left", on=["Date"])
joined_data.head()

Unnamed: 0,Date,Route,Attempted,Succeeded,Success Percentage,Battery Voltage AVG,Temperature AVG,Relative Humidity AVG,Wind Speed Daily AVG,Wind Direction AVG,Solare Radiation AVG
0,2015-11-27,Disappointment Cleaver,2,0,0.0,13.64375,26.321667,19.715,27.839583,68.004167,88.49625
1,2015-11-21,Disappointment Cleaver,3,0,0.0,13.749583,31.3,21.690708,2.245833,117.549667,93.660417
2,2015-10-15,Disappointment Cleaver,2,0,0.0,13.46125,46.447917,27.21125,17.163625,259.121375,138.387
3,2015-10-13,Little Tahoma,8,0,0.0,13.532083,40.979583,28.335708,19.591167,279.779167,176.382667
4,2015-10-09,Disappointment Cleaver,2,0,0.0,13.21625,38.260417,74.329167,65.138333,264.6875,27.791292


Remove Missing Values

In [6]:
joined_data.isna().sum()
joined_data = joined_data.dropna()
climbing_data = climbing_data[climbing_data['Success Percentage'] <= 1]  
joined_data.describe()
joined_data.head()

Unnamed: 0,Date,Route,Attempted,Succeeded,Success Percentage,Battery Voltage AVG,Temperature AVG,Relative Humidity AVG,Wind Speed Daily AVG,Wind Direction AVG,Solare Radiation AVG
0,2015-11-27,Disappointment Cleaver,2,0,0.0,13.64375,26.321667,19.715,27.839583,68.004167,88.49625
1,2015-11-21,Disappointment Cleaver,3,0,0.0,13.749583,31.3,21.690708,2.245833,117.549667,93.660417
2,2015-10-15,Disappointment Cleaver,2,0,0.0,13.46125,46.447917,27.21125,17.163625,259.121375,138.387
3,2015-10-13,Little Tahoma,8,0,0.0,13.532083,40.979583,28.335708,19.591167,279.779167,176.382667
4,2015-10-09,Disappointment Cleaver,2,0,0.0,13.21625,38.260417,74.329167,65.138333,264.6875,27.791292


There could be multiple rows on the same day for the same route, but with different number of "attempted" so we're going to create a group_id to make sure we keep them all separate

In [7]:
joined_data['Group_ID'] = joined_data['Date'].astype(str) + '_' + joined_data['Route'] + '_' + joined_data['Attempted'].astype(str) + '_' + joined_data['Succeeded'].astype(str)
joined_data['Group_ID'] = joined_data['Group_ID'].apply(hash)
#take a look at the new data to make sure it worked appropriately
joined_data[(joined_data['Route'] == 'Disappointment Cleaver') & (joined_data['Date'] == '10/3/2015')].head()

Unnamed: 0,Date,Route,Attempted,Succeeded,Success Percentage,Battery Voltage AVG,Temperature AVG,Relative Humidity AVG,Wind Speed Daily AVG,Wind Direction AVG,Solare Radiation AVG,Group_ID
5,2015-10-03,Disappointment Cleaver,10,0,0.0,13.5775,31.822917,62.337083,13.125042,153.931667,196.375208,3867840639226958325
6,2015-10-03,Disappointment Cleaver,2,0,0.0,13.5775,31.822917,62.337083,13.125042,153.931667,196.375208,-6591385863907409168


Next I want to create a correlation matrix to see if there any any interesting correlations within the data. 

In [8]:
correlation_matrix = joined_data.drop(columns=['Success Percentage', 'Group_ID']).corr(numeric_only=True)
correlation_matrix = correlation_matrix.stack().reset_index(name='correlation').rename(columns={'level_0':'variable_1','level_1':'variable_2'})

Visualize Correlation Matrix

`When I asked my participants about my low-fi correlation matrix, they wanted me to make sure to include a tooltip and make the values visible (not just the colors). That is why I created the text graph and the combined it with the heatmap`

In [9]:
heatmap = alt.Chart(correlation_matrix).mark_rect().encode(
    x='variable_1:N',
    y='variable_2:N',
    color='correlation:Q',
    tooltip=['variable_1', 'variable_2', 'correlation'] 
).properties(
    width=300,
    height=300
)

text = heatmap.mark_text(baseline='middle').encode(
    text=alt.Text('correlation:Q', format=".2f"),
    color=alt.condition(
        alt.datum.correlation > 0.5,
        alt.value('white'),
        alt.value('black')
    )
)

(heatmap + text).properties(
    title='Correlation Heatmap'
)

Unfortunately, it doesn't really seem like there is anything exciting here. The only things that are correlated are temp and solar radiation (which I think are supposed to be related, but I'm not a meteorologist)

Next I wanted to see if there was any optimal weather conditions for climbing

In [10]:
daily_climbing_data = climbing_data.groupby('Date').agg({
    'Attempted': 'sum',
    'Succeeded': 'sum'
}).reset_index()

# Filter days where the number of attempts is >= 10 since the lower number of attempts seem to skew the data
daily_climbing_data = daily_climbing_data[daily_climbing_data['Attempted'] >= 10]

#create a success rate and cap the results at 100% (it looks like there are some funky datapoints)
daily_climbing_data['Success Rate'] = (daily_climbing_data['Succeeded'] / daily_climbing_data['Attempted']).clip(upper=1)

merged_data = pd.merge(daily_climbing_data, weather_data, on='Date', how='inner')

Create Optimal Weather Conditions for Climbing Visualization

`When I asked my participants about my low-fi optimal weather conditions graph, they wanted me to separate the graphs and show one weather condition and success on each graph, as opposed to a single graph with multiple conditions`

In [11]:
base = alt.Chart(merged_data).encode(x='Date:T')

success_line_chart = base.mark_line(color='blue', interpolate='basis').encode(
    y=alt.Y('Success Rate:Q', scale=alt.Scale(domain=(0, 1))),
    tooltip=['Date:T', alt.Tooltip('Success Rate', format='.2%'), 'Attempted'],
)

temp_scatter_chart = base.mark_point(color='red', filled=True, size=100).encode(
    y='Temperature AVG:Q',
    tooltip=['Date:T', 'Temperature AVG']
)

wind_scatter_chart = base.mark_point(color='red', filled=True, size=100).encode(
    y='Wind Speed Daily AVG:Q',
    tooltip=['Date:T', 'Wind Speed Daily AVG']
)

humidity_scatter_chart = base.mark_point(color='red', filled=True, size=100).encode(
    y='Relative Humidity AVG:Q',
    tooltip=['Date:T', 'Relative Humidity AVG']
)

success_temp_final_chart = alt.layer(success_line_chart, temp_scatter_chart).resolve_scale(y='independent').properties(
    width=800,
    height=400,
    title='Success Rate and Temperature Over Time (Aggregated Daily)'
)

success_wind_final_chart = alt.layer(success_line_chart, wind_scatter_chart).resolve_scale(y='independent').properties(
    width=800,
    height=400,
    title='Success Rate and Wind Speed Over Time (Aggregated Daily)'
)

success_humidity_final_chart = alt.layer(success_line_chart, humidity_scatter_chart).resolve_scale(y='independent').properties(
    width=800,
    height=400,
    title='Success Rate and Humidity Over Time (Aggregated Daily)'
)

success_temp_final_chart & success_wind_final_chart & success_humidity_final_chart

Visualy, there seems to be a clear correlation between success rate and temperature (as the temp increases, success rates also generally increase). 

But lets look at it statistically:

In [12]:
correlation_temp_success = joined_data['Temperature AVG'].corr(joined_data['Success Percentage'])
correlation_wind_success = joined_data['Wind Speed Daily AVG'].corr(joined_data['Success Percentage'])
correlation_humid_success = joined_data['Relative Humidity AVG'].corr(joined_data['Success Percentage'])

print("Correlation between temperature and success rates:", correlation_temp_success)
print("Correlation between wind speed and success rates:", correlation_wind_success)
print("Correlation between humidity and success rates:", correlation_humid_success)

Correlation between temperature and success rates: 0.11496989450080661
Correlation between wind speed and success rates: -0.10207452507569502
Correlation between humidity and success rates: -0.06911929522945648


Interestingly, wind speed has nearly the same correlation (although negatively) to success percentage as temperature. That also makes sense as a windier day would make it much harder to climb.

Last I wanted to look a the weather trends of Mt. Rainier overall.

`In response to my low-fi weather trends graph, my participants (again) said I needed to split these out into their own separate graphs`

In [13]:

temperature_chart = alt.Chart(weather_data).mark_line().encode(
    x='Date:T',  
    y=alt.Y('Temperature AVG:Q', axis=alt.Axis(title='Temperature (°C)')), 
    tooltip=['Date', 'Temperature AVG']  
).properties(
    width=300,
    height=200,
    title='Temperature Trends Over Time'
)

humidity_chart = alt.Chart(weather_data).mark_line().encode(
    x='Date:T',
    y=alt.Y('Relative Humidity AVG:Q', axis=alt.Axis(title='Relative Humidity (%)')),
    tooltip=['Date', 'Relative Humidity AVG']
).properties(
    width=300,
    height=200, 
    title='Relative Humidity Trends Over Time'
)

wind_speed_chart = alt.Chart(weather_data).mark_line().encode(
    x='Date:T',
    y=alt.Y('Wind Speed Daily AVG:Q', axis=alt.Axis(title='Wind Speed (m/s)')),
    tooltip=['Date', 'Wind Speed Daily AVG']
).properties(
    width=300,
    height=200,
    title='Wind Speed Trends Over Time'
)

temperature_chart | humidity_chart | wind_speed_chart

I was surprised to see the wind speeds were so small throughout the summer. I assumed the wind would be increasing and follow the same trends as temp. Humidity was very all over the place and temp was predictable. 

### Key Elements of Design and Justification:
My visualization design incorporates several key elements to ensure effectiveness and usability:

__Clarity and Simplicity:__ The visualizations are designed to be straightforward and easy to understand, catering to a wide range of users, including climbers, researchers, and expedition planners.

__Relevance to Stakeholders:__ Prioritize including variables and insights that are directly relevant to stakeholders' decision-making processes to provide actionable insights.

__Interactive Features:__ Incorporating interactive elements such as tooltips, to enhance user engagement and allows stakeholders to explore the data more deeply. By enabling users to interact with the visualizations, we empower them to uncover nuanced patterns and insights tailored to their specific interests.

__Visual Aesthetics:__ Visually appealing design elements such as color palettes, typography, and layout as to enhance the overall aesthetics of the visualizations.

__Data Integrity and Accuracy:__ Ensuring the accuracy and integrity of the data presented in the visualizations is paramount to instill confidence in the insights derived from the visualizations.

***
### Final Evaluation Approach:
__People Involved:__ I work in a data analytics department, and I also climb at a climbing gym, so I recruited 3 colleagues and 2 members of my gym to both look at my low-fi sketches and the final dashboard

__Measures:__
- Insight Depth: Evaluate the extent to which the visualization enhances our understanding of the correlations between climbing success and weather paters
- Use Cases: Determine the practical applicability of the visualization in facilitating informed decision-making or in developing predictive models for climbing success rates

__Approach:__ I used a mixed methods approach with both quantitative and qualitative methods. 
- Data Exploration and Visualization: I perform data exploration and visualizations for my subjects to examine the dataset.
- User surveys: I received feedback from my subjects and made the appropriate changes for the final dashboard

__Criteria for Success:__
- The visualizations need to provide clear and useful insights into what makes climbing trips successful.
- These insights should match what experts in climbing and weather know.
- People using the visualization should see it as helpful for making decisions, like figuring out the best weather for successful climbs.

*Overall, the visualization's success depends on how well it helps us understand what factors lead to successful climbs, possibly guiding future analyses or decisions.*
***
### Procedure:
__Data Preparation:__ We preprocessed the data, including handling missing values, scaling features, and encoding categorical variables as necessary. This step ensured that the data was in a suitable format for analysis and modeling.

__Exploratory Data Analysis (EDA):__ We explored the relationships between climbing and weather variables using correlation analysis and visualization techniques. This provided insights into which weather factors influenced summit success rates the most.

__Visualization Design:__ We designed informative and visually appealing charts and graphs to present our findings effectively. This included selecting appropriate chart types, color schemes, and annotations to enhance understanding.

__Evaluation:__ We evaluated the performance of our models and visualizations using the approach outlined in the Final Evaluation section. This involved both quantitative analysis and qualitative feedback from stakeholders.
***
### Results:
Unfortunately, the EDA and visualizations I created did not tell a very exciting story. The only things that were interesting was the fact that, generally, as the temperature got hotter, the success rates increased. Also, as the wind speed decreased, climbing success increased.

In talking to my climbing participants, this was expected. When it it colder, it is harder to grip with your hands and feet. You also have to carry more gear. Also, when it is windier, it is harder to climb.

***
### Synthesis of Findings:
With the above steps completed, our project provided a comprehensive overview of the relationship between weather conditions and summit success rates on Mount Rainier, offering valuable insights for climbers and expedition planning.
__TL;DR:__
It's best to climb in warm, not windy, weather.