# DoingDataAnalysisRight
Course materials for Doing Data Analysis Right

**Inspiration for the Course**
- https://developers.google.com/machine-learning/guides/good-data-analysis
- https://youtu.be/_ZEWDGpM-vM
- https://github.com/owid/owid-datasets/tree/master

**Prerequisites**
- Familiarity with a programming language, preferably Python, although the code will be accessible, so if you have exposure (proficiency) with another language, you should be able to follow along just fine.
- Basic understanding of descriptive statistics like a mean, median, or standard deviation.
- An keen interest in the methods and art form of doing data analysis

**References**
- https://docs.python.org/3.9/reference/index.html
- https://pandas.pydata.org/docs/whatsnew/v1.5.3.html#
- https://plotly.com/python/plotly-express/

After finishing this course you will be able to analyze a complex dataset of your choosing for the purpose of informing better decision making.  This will be facilitated by an illustrated end to end example with clear and consise description of how and why to perform each step of the analysis process and ultimately communicate your results to a non-technical audience.

# Introduction
Welcome to the world of data analysis, where deriving truth and insight from a vast sea of data is a powerful yet challenging endeavor. As data analysts and data-minded engineers, our mission is to develop a reputation for making credible pronouncements from data. But what exactly sets apart the best analysts from the rest? You might have heard adjectives like "careful" and "methodical," but what do these experts actually do to earn that praise?

This question becomes particularly intriguing when we consider the type of data we regularly encounter, much like that at Google. Our datasets are not only massive but also incredibly rich, with each data point comprising numerous attributes. Combine this complexity with temporal sequences of events for a single user, and we're faced with an overwhelming number of ways to analyze the data. Unlike typical academic psychology experiments where researchers can examine every single data point with ease, our large, high-dimensional datasets present unique challenges that require a distinct approach.

Good data analysis is indispensable, not just for tech giants like Google but for businesses and organizations worldwide. For instance, in the medical field, analyzing patient data can lead to groundbreaking treatment discoveries. In finance, data-driven insights can identify potential market trends and risks. In climate science, data analysis helps predict and mitigate the impact of natural disasters.

The consequences of poor data analysis can be dire, too. Misinterpreting data may lead to flawed conclusions, impacting critical decision-making. Consider a transportation company that failed to analyze customer feedback data effectively. As a result, they missed an opportunity to address recurring complaints about service delays, leading to a decline in customer satisfaction and loyalty.

In conclusion, data analysis done right is not just about crunching numbers; it's about meticulousness, curiosity, and the ability to communicate effectively. Throughout this course, we'll equip you with the tools and mindset to approach data analysis confidently, allowing you to make credible pronouncements that drive real-world impact. So, let's embark on this exciting journey together and unleash the true potential of data analysis!

# Our Working Example
In this course, we will embark on a crucial working example that aims to address one of the most pressing questions of our time: how can we transition to a greener energy profile while considering the trade-offs involved in land use?

As we confront the challenges of global warming, reshaping our energy production is a paramount concern. The shift from traditional energy sources to more sustainable yet potentially land-intensive alternatives demands careful analysis and strategic decision-making. Our working example will revolve around quantifying the land requirements for this transition, providing essential insights for informed investments in renewable energy.

By exploring the delicate balance between environmental impact and energy sustainability, you will develop the skills to navigate complex energy datasets and draw meaningful conclusions. Let's join forces to understand the data-driven pathways to a greener and more sustainable future. Together, we can make a tangible difference in combating climate change and shaping a cleaner world for generations to come.

# Course Overview
In this course, we will embark on a journey through three crucial aspects of data analysis that will equip you with the skills to tackle these challenges with confidence and precision. 

**1. Technical: Unleashing the Power of Data Manipulation and Examination**

We'll start by mastering the technical tools and techniques necessary to manipulate and examine data effectively. From data wrangling and cleaning to data visualization and statistical analysis, you'll learn how to unleash the potential hidden within the data. 

**2. Process: Navigating the Data Analysis Journey**

Next, we'll guide you through a structured process to approach your data analysis. We'll explore the right questions to ask, how to define clear objectives, and what essential checks to perform to ensure the integrity of your analysis. 

**3. Mindset: Collaboration and Communication for Impactful Insights**

Data analysis is not a solitary endeavor. It requires collaboration and effective communication to translate complex findings into actionable insights. We'll delve into the art of presenting data-driven conclusions to potentially non-data literate stakeholders. 

In [47]:
#!pip install -q -r requirements.txt

In [32]:
import pandas as pd
import plotly.express as px
from analysis_includes.includes import get_url_input
import sweetviz as sv

In [40]:
# https://bit.ly/44KB6d3
cb = pd.read_csv(get_url_input())

Please enter the data URL from video:  https://bit.ly/44KB6d3


In [42]:
cb.head()

Unnamed: 0,column,description,source
0,country,Geographic location,Our World in Data
1,year,Year of observation,Our World in Data
2,iso_code,ISO 3166-1 alpha-3 three-letter country codes,International Organization for Standardization
3,population,Population,Calculated by Our World in Data based on diffe...
4,gdp,"Total real gross domestic product, inflation-a...",Maddison Project Database


In [43]:
# https://bit.ly/3qe6IZi
df = pd.read_csv(get_url_input())

Please enter the data URL from video:  https://bit.ly/3qe6IZi


In [44]:
my_report = sv.analyze(df[df.columns[0:20]])
my_report.show_html()

                                             |          | [  0%]   00:00 -> (? left)

Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
