<a href="https://colab.research.google.com/github/brendanpshea/data-science/blob/main/DataScience_05_WriteBetterQueries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Science Chapter 5: Writing Better Queries

When dealing with large amounts of data, the ability to write effective queries is not just a skill—it's an art form that can significantly impact the success of your projects. **Efficient queries** are the cornerstone of data analysis, enabling you to extract meaningful insights from vast seas of information with precision and speed.

But why does "better" matter when it comes to writing queries? The answer lies in the **three critical factors** that define the quality of your data interactions:

1. *Performance*. Better queries run faster, consuming less computational resources. In a world where time is money, this translates to cost savings and quicker insights.

2. *Accuracy*. Improved query writing ensures you're extracting exactly the data you need. This precision minimizes errors and misinterpretations that could lead to flawed analyses.

3. *Scalability*. As your data grows, well-crafted queries continue to perform efficiently, allowing your analyses to scale seamlessly.

Throughout this chapter, we'll explore techniques to enhance your query writing skills, focusing on data manipulation, optimization strategies, and best practices. By the end, you'll be equipped to craft queries that not only retrieve data but do so with elegance and efficiency.

Remember, in the world of big data, the difference between a good query and a great one can be the difference between drowning in information and surfing the waves of insight. Let's dive in and learn how to write better queries!


## Sample Data Set: Zombie Attacks!
For this chapter, we'll be dealing with data set about zombie attacks. Let's start by loading our data set and taking a look.

In [1]:
!wget https://github.com/brendanpshea/data-science/raw/main/data/zombie_attacks.csv -q -nc

In [2]:
## load csv file into a sqlite database
import pandas as pd
import sqlite3

# Load the CSV file into a DataFrame
df = pd.read_csv('zombie_attacks.csv')

# Save to SQLite
conn = sqlite3.connect('zombie_attacks.db')
df.to_sql('ZombieAttacks', conn, if_exists='replace', index=False)
conn.close()

### Getting to Know Our Data
Now, let's connect to the database and take a look at our data.

In [3]:
%reload_ext sql
%config SqlMagic.autopandas = True
%sql sqlite:///zombie_attacks.db

In [6]:
%%sql
--Get table schema
PRAGMA table_info(ZombieAttacks);

 * sqlite:///zombie_attacks.db
Done.


Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,Date,TEXT,0,,0
1,1,Location,TEXT,0,,0
2,2,ZombieType,TEXT,0,,0
3,3,VictimCount,REAL,0,,0
4,4,SurvivalRate,REAL,0,,0
5,5,WeatherCondition,TEXT,0,,0
6,6,MoonPhase,TEXT,0,,0
7,7,TemperatureCelsius,REAL,0,,0
8,8,HumidityPercent,REAL,0,,0
9,9,WindSpeedKmh,REAL,0,,0


In [5]:
%%sql
SELECT *
FROM ZombieAttacks
LIMIT 5;

 * sqlite:///zombie_attacks.db
Done.


Unnamed: 0,Date,Location,ZombieType,VictimCount,SurvivalRate,WeatherCondition,MoonPhase,TemperatureCelsius,HumidityPercent,WindSpeedKmh,PopulationDensity,EmergencyResponseTime,Month
0,2023-02-24,Des Moines,Runner,25.5,0.2215,Foggy,Full Moon,12.8,58.9,8.1,65.1,8.7,2
1,2023-09-29,Rochester,Walker,13.0,0.3739,Stormy,Waxing Crescent,23.2,46.4,35.6,594.5,4.0,9
2,2023-06-01,Rochester,Walker,12.0,0.1924,Cloudy,New Moon,22.6,31.9,9.9,236.4,13.2,6
3,2023-02-14,St. Louis,Crawler,7.0,0.7949,Stormy,New Moon,10.6,57.9,5.3,156.8,4.7,2
4,2023-08-31,Winnipeg,Runner,39.0,0.0678,Sunny,Full Moon,16.7,45.1,3.8,19.6,6.9,8


### Data Dictionary for `zombie_attacks.csv`

| **Column Name** | **Data Type** | **Description** |
| --- | --- | --- |
| `Date` | `datetime` | The date of the recorded zombie attack. |
| `Location` | `string` | The location where the zombie attack occurred, centered around major cities near Minneapolis. |
| `ZombieType` | `string` | The type of zombie involved in the attack, with possible values: 'Walker', 'Runner', 'Crawler', 'Jumper'. |
| `VictimCount` | `integer` | The number of victims in the zombie attack. |
| `SurvivalRate` | `float` | The survival rate of victims, represented as a proportion between 0 and 1. |
| `WeatherCondition` | `string` | The weather condition at the time of the attack, with possible values: 'Sunny', 'Rainy', 'Cloudy', 'Foggy', 'Stormy'. |
| `MoonPhase` | `string` | The phase of the moon at the time of the attack, with possible values: 'New Moon', 'Waxing Crescent', 'First Quarter', 'Waxing Gibbous', 'Full Moon', 'Waning Gibbous', 'Last Quarter', 'Waning Crescent'. |
| `TemperatureCelsius` | `float` | The temperature in degrees Celsius at the time of the attack, adjusted for weather conditions and location-specific patterns. |
| `HumidityPercent` | `float` | The humidity percentage at the time of the attack. |
| `WindSpeedKmh` | `float` | The wind speed in kilometers per hour at the time of the attack. |
| `PopulationDensity` | `float` | The population density of the location where the attack occurred. |
| `EmergencyResponseTime` | `float` | The time in minutes for emergency response to arrive at the scene of the attack. |
| `Month` | `integer` | The month of the year when the attack occurred, extracted from the `Date` column. |

### Data Dictionary