# Scenario

This project analyzes COVID-19 data from January 2020 to July 2023. Basic to advanced SQL queries are run to answer questions and identify trends throughout the pandemic. Once the data exploration in SQL is complete, the queries from SQL are used to create data visualizations in Tableau. **View my Visualizations** [here](https://public.tableau.com/app/profile/giankarlo.alvarado/viz/Covid-19ProjectVizualizations/Dashboard1)

# Data Exploration Business Task

The objective of this project is to analyze COVID-19 data from 2020 to 2023 using SQL queries. I focused on exploring insights specific to the United States. Additionally, I also delved into global and regional data during the exploration.

# The Data

This dataset is from [Our World in Data](https://ourworldindata.org/covid-deaths)

# Skills Used

Skills used: Joins, CTE's, Aggregate Functions, Converting Data Types

# Analysis 

Analysis Through SQL:
Viewing the datasets

In [None]:
SELECT *
FROM `project-2-377803.Covid19_data.covid_vaccinations`
WHERE continent IS NOT NULL
ORDER BY date, population

In [None]:
SELECT *
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE continent IS NOT NULL
ORDER BY date, population

### Data Exploration - Covid Deaths Data

#### This shows the likelihood of dying in the United States if you contract covid

In [None]:
SELECT
location,
date,
total_cases,
total_deaths,
(total_deaths/total_cases)*100 as death_per_case
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE
location = 'United States'
ORDER BY
date DESC

[See Results Here](https://docs.google.com/spreadsheets/d/1htlbtvlLcdrz4BoaHtTXcYPDJ5o8-JTQis-cXL9L4cU/edit#gid=1294418776)

Findings: As of July 12, 2023, There are 103,436,829 total cases with a 1.089% chance of dying if you contract covid while living in the United States. 

#### This Query shows the percentage of the population in the United States that has contracted covid

In [None]:
-- Showing the percentage of population who got covid
SELECT location,
date,
population,
total_cases,
(total_cases/population) *100 as infected_percentage
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE location = 'United States'
ORDER BY date DESC

[See Results Here](https://docs.google.com/spreadsheets/d/1pqg8VS1Q02X8lK3mPGvQFd1fUl-D8lvRfJbPOlVQAX8/edit#gid=1553022584)

Findings: As of July 12, 2023, approximately 30.57% (~338M) of the United States' population have contracted COVID-19.

#### Analysis on the countries with the highest infection rate compared to the population

In [None]:
-- Looking at the countries with the highest infection rate compared to their population --
-- Main query to calculate COVID-19 infection statistics per location (country)
SELECT
location,
population,
MAX(total_cases) as highest_infection_count, 
MAX((total_cases/population)) * 100 as infected_population_percentage 
FROM `project-2-377803.Covid19_data.covid_deaths`
GROUP BY Location, continent, population -- Group the data by location, continent, and population
ORDER BY infected_population_percentage DESC; -- Sort the results in descending order of infected population percentage

[See Results Here](https://docs.google.com/spreadsheets/d/1zXqZl3hvsDwBym4PoHdl0gIbUgU6PJf5FIcROvjke2c/edit#gid=982406726)

Findings: Cyprus has the highest infection rate, with 73.75% (~660K) of its 896,007 population infected over the last 3 years.

#### Looking at the Infection rate of the United States' population and it's rank out of all countries

In [None]:
-- Looking at USA with the highest infection rate compared to their population --
-- Query to find the highest infection rate compared to population in the USA
SELECT
location,
population,
MAX(total_cases) AS highest_infection_count, -- Calculate the highest infection count in the USA
MAX((total_cases/population)) * 100 AS infected_population_percentage -- Calculate the percentage of infected population in the USA
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE location = 'United States' -- Filter the data to include only the USA
GROUP BY location, population; -- Group the data by location (USA) and population

Findings: The US has an infection rate of 30.57%, ranking 69th out of 243.

#### Analysis on which countries in the world had the highest death count

In [None]:
-- Countries with the highest death count --
SELECT location,
MAX(CAST(total_deaths AS INT)) as total_death_count
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE continent IS NOT NULL -- data was showing continents as countries --
GROUP BY location
ORDER BY total_death_count DESC

[See Results Here](https://docs.google.com/spreadsheets/d/1Yj8RstKCgTByW7nZAlksaerBzYZhGEduc_EV8ciZJvY/edit#gid=621481183)

Findings: The United States has the highest total death count at 1,127,152, followed by Brazil (704,159), and India (531,913).

#### Analysis on how many people were hospitalized in the ICU in the United States

In [None]:
-- People who were hospitalized due to covid-19. How many people were in the ICU?
SELECT
location,
date,
total_cases,
CAST(hosp_patients AS INT64) as hosp_patient,
(CAST(hosp_patients AS INT64) / total_cases) * 100 AS hosp_per_case
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE
location = 'United States'
ORDER BY date;

[See Results Here](https://docs.google.com/spreadsheets/d/1KRUOnFYMKGjCT4HppUGxHi0IgZtHrLb6btIgsQJH13k/edit#gid=1876284143)

Findings: People began to be hospitalized on 07-15-2020 with total cases at 3,442,977 and 33,760 patients admitted that day.

In [None]:
--How many people were in the ICU?--
SELECT
location,
date,
total_cases,
icu_patients,
(CAST(icu_patients AS INT64) / total_cases) * 100 AS hosp_per_case
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE
location = 'United States'
ORDER BY date;

[See Results Here](https://docs.google.com/spreadsheets/d/1BCAJh6I-fQheJtCKYyXawCLqcmTnkw9DHvjQjROtX9Y/edit#gid=1527733154)

Findings: People also started being admitted to intensive care units on the same date (07/15/2020), with 9,245 admissions.

Remarkable how cases began from 1/20/2020 but people were not hospitalized until  07/15/2020.

#### Analysis on the continents with the highest death count

In [None]:
-- Continents with the highest death count --
SELECT continent,
MAX(CAST(total_deaths AS INT)) as total_death_count
FROM `project-2-377803.Covid19_data.covid_deaths`
WHERE continent IS NOT NULL -- data was showing continents as countries --
GROUP BY continent
ORDER BY total_death_count DESC

[See Results Here](https://docs.google.com/spreadsheets/d/1utp_XZOwM3QHZFPT0ZOWyEjlEdk1PK6lm8Ct-m_juAA/edit#gid=1055974193)

Findings: North America has the highest total death count at 1,127,152, followed by South America (704,159) and Asia (531,913).

#### Analysis on the global death percentage affected by Covid-19 

In [None]:
-- What is the overall death percentage? --
SELECT
SUM(new_cases) AS total_cases,
SUM(CAST(new_deaths AS INT)) AS total_deaths,
SUM(CAST(new_deaths AS INT))/SUM(new_cases)*100 AS global_death_pecentage
FROM
`project-2-377803.Covid19_data.covid_deaths`
WHERE continent IS NOT NULL
ORDER BY 1,2

[See Results Here](https://docs.google.com/spreadsheets/d/1vaEOlYExpfcY7EKFWDMywddg-EiUMECxH049HhiPauk/edit#gid=964108467)

Findings: There are a total of 767,987,798 global cases, with 6,957,132 covid-related deaths. Roughly 9% of the world has been affected since it became a pandemic.

###  Data Exploration - Covid Vaccinations Data

In [None]:
-- Viewing vaccination data --

SELECT *
FROM
`project-2-377803.Covid19_data.covid_vaccinations`
WHERE continent IS NOT NULL
ORDER BY date, location

#### The percent of the population that has been vaccinated in each country

In [None]:
-- Shows the percentage of population that has been vaccinated
SELECT
dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(CAST(vac.new_vaccinations AS INT64)) OVER (PARTITION BY dea.location ORDER BY dea.date) AS rolling_ppl_vaccinated
FROM
`project-2-377803.Covid19_data.covid_deaths` dea
JOIN
`project-2-377803.Covid19_data.covid_vaccinations` vac ON dea.location = vac.location
AND dea.date = vac.date
WHERE dea.continent IS NOT NULL

[See Results Here](https://docs.google.com/spreadsheets/d/1JyDA1WQ3r6kN5X0E13RY5xUYmsYzPV3NE-2qKYMOJic/edit#gid=1186868023)

#### This query calculates the rolling people vaccinated and the vaccinated percentage for each country

In [None]:
---- Using Common Table Expression (CTE) to perform calculation on PARTITION BY in previous query
-- Query to calculate COVID-19 vaccination statistics per location and date
-- It calculates the rolling number of people vaccinated and vaccination percentage for each location on a specified date.


WITH population_vaccinated AS
(
-- Subquery to calculate the rolling number of people vaccinated per location and date
SELECT
dea.continent, -- Column: Continent (the continent to which the location belongs)
dea.location, -- Column: Location (the name of the location/country)
dea.date, -- Column: Date (the date of vaccination data)
dea.population, -- Column: Population (the total population of the location)
vac.new_vaccinations, -- Column: New_vaccinations (the number of new vaccinations on the date)
SUM(CAST(vac.new_vaccinations -- Calculate the rolling sum of new vaccinations using SUM() function
AS INT64)) OVER (PARTITION BY dea.location ORDER BY dea.date) AS rolling_ppl_vaccinated
FROM
`project-2-377803.Covid19_data.covid_deaths` dea -- Source table for COVID-19 death data
JOIN
`project-2-377803.Covid19_data.covid_vaccinations` vac -- Source table for COVID-19 vaccination data
ON dea.location = vac.location -- Join the tables on the 'location' column
AND dea.date = vac.date -- Match the data based on the 'date' column
WHERE
dea.continent IS NOT NULL -- Exclude records with NULL continent values
)


-- Main query to retrieve vaccination statistics per location and date
SELECT
continent, -- Column: Continent (the continent to which the location belongs)
location, -- Column: Location (the name of the location/country)
date, -- Column: Date (the date of vaccination data)
population, -- Column: Population (the total population of the location)
new_vaccinations, -- Column: New_vaccinations (the number of new vaccinations on the date)
rolling_ppl_vaccinated, -- Column: Rolling_People_Vaccinated (rolling number of people vaccinated)
(rolling_ppl_vaccinated / population) * 100 AS percent_vaccinated -- Calculate the vaccination percentage
FROM
population_vaccinated;

[See Results Here](https://docs.google.com/spreadsheets/d/1Jo5CGqMJ8lAMwCkQHE7CqoZI-_iml7L9BAc4G4p_pgA/edit#gid=1646615744)

New_vaccinations indicate the number of people who have been vaccinated on that particular day. Rolling_ppl_vaccinated shows the cumulative number of people who have been vaccinated, and percent_vaccinated shows the percentage of the population in that country who have been vaccinated.

#### The percentage of people vaccinated within the United States population

In [None]:
WITH population_vaccinated AS
(
-- Subquery to calculate the rolling number of people vaccinated per location and date
SELECT
dea.continent, -- Column: Continent (the continent to which the location belongs)
dea.location, -- Column: Location (the name of the location/country)
dea.date, -- Column: Date (the date of vaccination data)
dea.population, -- Column: Population (the total population of the location)
vac.new_vaccinations, -- Column: New_vaccinations (the number of new vaccinations on the date)
SUM(CAST(vac.new_vaccinations AS INT64)) OVER (PARTITION BY dea.location ORDER BY dea.date) AS rolling_ppl_vaccinated
FROM
`project-2-377803.Covid19_data.covid_deaths` dea -- Source table for COVID-19 death data
JOIN
`project-2-377803.Covid19_data.covid_vaccinations` vac -- Source table for COVID-19 vaccination data
ON dea.location = vac.location -- Join the tables on the 'location' column
AND dea.date = vac.date -- Match the data based on the 'date' column
WHERE
dea.continent IS NOT NULL -- Exclude records with NULL continent values
)


-- Main query to retrieve vaccination statistics for the "United States" location
SELECT
continent, -- Column: Continent (the continent to which the location belongs)
location, -- Column: Location (the name of the location/country)
date, -- Column: Date (the date of vaccination data)
population, -- Column: Population (the total population of the location)
new_vaccinations, -- Column: New_vaccinations (the number of new vaccinations on the date)
rolling_ppl_vaccinated, -- Column: Rolling_People_Vaccinated (rolling number of people vaccinated)
(rolling_ppl_vaccinated / population) * 100 AS percent_vaccinated -- Calculate the vaccination percentage
FROM
population_vaccinated
WHERE
location = 'United States'; -- Filter to include only data for the "United States" location

[See Results Here](https://docs.google.com/spreadsheets/d/1CK7ahQ7PkLp7sZiUvj0pq73wkb-yJuWWy0hSoLyy5Ks/edit#gid=1299517625)

#### Common Table Expression of the percentage of each country’s population that is vaccinated as of today (07/12/2023)

In [None]:
/* What percentage of each country's population is vaccinated as of today (07/12/2023)? */


WITH population_vaccinated AS
(
/* Subquery to calculate the rolling number of people vaccinated per location and date */
SELECT
dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(CAST(vac.new_vaccinations AS INT64)) OVER (PARTITION BY dea.location ORDER BY dea.date) AS rolling_ppl_vaccinated
FROM
`project-2-377803.Covid19_data.covid_deaths` dea
JOIN
`project-2-377803.Covid19_data.covid_vaccinations` vac
ON dea.location = vac.location
AND dea.date = vac.date
WHERE
dea.continent IS NOT NULL
)


SELECT
Location,
Date,
Population,
New_vaccinations,
Rolling_ppl_Vaccinated,
(Rolling_ppl_Vaccinated/Population)*100 AS Percent_vaccinated
FROM
population_vaccinated
WHERE
Date = '2023-07-12'
ORDER BY percent_vaccinated DESC;

[See Results Here](https://docs.google.com/spreadsheets/d/1g7fC7k3ZwRic0a0FzFdzyFNPi2rLR8t9veXK7vUDabY/edit#gid=1341191250)

#### Analysis on the vaccination rate of the US population

In [None]:
-- percent vaccinated vs population in the US as of 07/12/23 --
-- Main query to retrieve COVID-19 vaccination statistics for the United States on a specific date (2022-07-12)
SELECT
Location,
Date,
Population,
New_vaccinations,
Rolling_People_Vaccinated,
(Rolling_People_Vaccinated / Population) * 100 AS Percent_vaccinated -- Calculate vaccination percentage
FROM (
-- CTE to calculate COVID-19 vaccination statistics for all locations (countries) and a specified date (2022-07-12)
SELECT
dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(CAST(vac.new_vaccinations AS NUMERIC)) OVER (PARTITION BY dea.location ORDER BY dea.date) AS Rolling_People_Vaccinated
FROM
`project-2-377803.Covid19_data.covid_deaths` dea
JOIN
`project-2-377803.Covid19_data.covid_vaccinations` vac ON dea.location = vac.location
AND dea.date = vac.date
WHERE
dea.continent IS NOT NULL -- Exclude any data without a continent (country) specified
)
WHERE
Date = DATE('2023-07-12') -- Filter data for the specified date (2023-07-12)
AND Location = 'United States' -- Filter data for the United States only
ORDER BY Date;

[See Results Here](https://docs.google.com/spreadsheets/d/1y9pFgG5lSRifg3Y5mH_U1QKlAsm7ip2br9nNFwikVJE/edit#gid=622020671)

Findings: About 200% has been vaccinated.

#### Analysis of the percentage of vaccinations in each continent as of 07/12/2023

In [None]:
/*
This query calculates COVID-19 vaccination statistics on a specified date (2023-07-12) per continent.


1. CTE percent_population_vaccinated: Calculates the rolling number of people vaccinated and vaccination percentage per location (country) based on COVID-19 death and vaccination data.


2. CTE highest_vaccinations_by_continent: Determines the highest vaccination count for each continent on the specified date.


3. Main Query: Gathers the total population, total vaccinations, total rolling vaccinations, and vaccination percentage per continent on the specified date.


*/


WITH percent_population_vaccinated AS (
SELECT
dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(CAST(vac.new_vaccinations AS NUMERIC)) OVER (PARTITION BY dea.location ORDER BY dea.date) AS rolling_ppl_vaccinated
FROM
`project-2-377803.Covid19_data.covid_deaths` dea
JOIN
`project-2-377803.Covid19_data.covid_vaccinations` vac ON dea.location = vac.location
AND dea.date = vac.date
WHERE
dea.continent IS NOT NULL
),
highest_vaccinations_by_continent AS (
SELECT
Continent,
MAX(New_vaccinations) AS highest_vaccination_count
FROM
percent_population_vaccinated
WHERE
date = '2023-07-12' AND
Continent IS NOT NULL
GROUP BY Continent
ORDER BY highest_vaccination_count DESC
)
SELECT
continent, -- Change location to continent
date,
SUM(population) AS total_population, -- Sum population per continent
SUM(new_vaccinations) AS total_vaccinations, -- Sum new_vaccinations per continent
SUM(rolling_ppl_vaccinated) AS total_rolling_vaccinations, -- Sum rolling_ppl_vaccinated per continent
(SUM(rolling_ppl_vaccinated) / SUM(population)) * 100 AS Percent_vaccinated -- Calculate percentage per continent
FROM
percent_population_vaccinated
WHERE
date = '2023-07-12'
GROUP BY continent, date
ORDER BY date;

[See Results Here](https://docs.google.com/spreadsheets/d/1F3NJfYBsTz6FF9pELDPzvMGrlqJOhX76aUvMBL5x-Ws/edit#gid=1011341243)

Findings: South America, has the highest percent at 159% followed by Europe at 147% and africa has the lowest at 8%.

#### The Percentage of the world that is vaccinated

In [None]:
/* What is percentage of the world vaccinated */


WITH population_vaccinated AS
(
/* Subquery to calculate the rolling number of people vaccinated per location and date */
SELECT
dea.continent,
dea.location,
dea.date,
dea.population,
vac.new_vaccinations,
SUM(CAST(vac.new_vaccinations AS INT64)) OVER (PARTITION BY dea.location ORDER BY dea.date) AS rolling_ppl_vaccinated
FROM
`project-2-377803.Covid19_data.covid_deaths` dea
JOIN
`project-2-377803.Covid19_data.covid_vaccinations` vac
ON dea.location = vac.location
AND dea.date = vac.date
WHERE
dea.continent IS NOT NULL
)


SELECT
SUM(New_vaccinations) AS total_vaccinations,
(SUM(CAST(New_vaccinations AS BIGINT))/SUM(Population))*100 AS global_vacc_percentage
FROM
population_vaccinated
WHERE continent IS NOT NULL

[See Results Here](https://docs.google.com/spreadsheets/d/12JjipQZU-EJA7OfnWfy_MaWyaTFq4QUSL0iOba8bbjs/edit#gid=1209156072)

# Key Findings
* As of July 12, 2023, There are 103,436,829 total cases with a 1.089% chance of dying if you contract covid while living in the United States. 
* As of July 12, 2023, approximately 30.57% (~338M) of the United States' population have contracted COVID-19.
* Cyprus has the highest infection rate, with 73.75% (~660K) of its 896,007 population. infected over the last 3 years.
* The US has an infection rate of 30.57%, ranking 69th out of 243 countries.  
* The United States has the highest total death count at 1,127,152, followed by Brazil (704,159), and India (531,913).
* People began to be hospitalized on 07-15-2020 with total cases at 3,442,977 and 33,760 patients admitted that day. 
* North America has the highest total death count at 1,127,152, followed by South America (704,159) and Asia (531,913).
* There are a total of 767,987,798 global cases, with 6,957,132 covid-related deaths. Roughly 9% of the world has been affected since it became a pandemic.
* 10,849,274,710 people worldwide have been vaccinated, which is 10% of the global population.