# Portfolio Project 1 - SQL Data Exploration

The goal of this project is to showcase how to perform data exploration using SQL on a database with COVID-19 cases, deaths and vaccination worlwide and by country. The dataset was downloaded from [Our World in Data](https://ourworldindata.org/covid-deaths) on August 16, 2021, and contains data until August 15, 2021.

Here are the questions we want to answer with the data exploration:
- Number of cases, deaths and percentage of deaths per case in the world
- Minimum and maximum death percentage of deaths per case in the world
- Rolling mean of new cases and new deaths in the world
- Countries with highest percentage of the population infected
- Countries with highest number of total deaths and their percentage of deaths per case
- When the first vaccination occurred
- Evolution of the vaccination and the percentage of the population vaccinated in the world
- The percentage of the population vaccinated in the 10 most populous countries in the world

This project was inspired by a [video from Alex the Analyst](https://www.youtube.com/watch?v=qfyynHBFOsM). In addition, in order to import the dataset into a Microsoft SQL Server, we had to perform a series of extra queries, not mentioned in the video, because we could only use a .csv file. The queries are in the file data_preparation.sql.

The skills used in this project include: Joins, CTE's, Temp Tables, Windows Functions, Aggregate Functions, and Creating Views.

## 1. Setting up a connection

First, we have to load the ipython-sql library and setup a connection to the server and to the database where the data is stored.

In [1]:
%%capture
%load_ext sql
%sql mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server

## 2. Understanding the data

Now it's time to take a look at the tables in our database.

In [2]:
%%sql
SELECT *
FROM INFORMATION_SCHEMA.TABLES;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,TABLE_TYPE
PortfolioProject1,dbo,CovidDeaths,BASE TABLE
PortfolioProject1,dbo,CovidVaccinations,BASE TABLE


We have two tables, one with data about the deaths due to COVID-19 and the other one regarding the vaccination.

Let's take a look at a sample of the CovidDeaths table.

In [3]:
%%sql
SELECT TOP 5 *
FROM CovidDeaths
WHERE location = 'Brazil';

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


iso_code,continent,location,date,population,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million
BRA,South America,Brazil,2020-02-26,212559409.0,1.0,1.0,,,,,0.005,0.005,,,,,,,,,,,,,
BRA,South America,Brazil,2020-02-27,212559409.0,1.0,0.0,,,,,0.005,0.0,,,,,,,,,,,,,
BRA,South America,Brazil,2020-02-28,212559409.0,1.0,0.0,,,,,0.005,0.0,,,,,,,,,,,,,
BRA,South America,Brazil,2020-02-29,212559409.0,2.0,1.0,,,,,0.009,0.005,,,,,,,,,,,,,
BRA,South America,Brazil,2020-03-01,212559409.0,2.0,0.0,,,,,0.009,0.0,,,,,,,,,,,,,


In the CovidDeaths table we can see that the first few columns relate to the location of the input, including a code, the continent and the location/country. Then we have a column with the date of that data point, followed by the population for that location. Moving along we have data related to COVID-19, including total cases, new cases, total deaths, new deaths, total cases per million, reproduction rate, number of hospitalized patients, etc.

For this initial exploration project, we'll focus on the data related to the number of cases and deaths.

Now we'll see what is stored in the CovidVaccinations table.

In [4]:
%%sql
SELECT TOP 5 *
FROM CovidVaccinations
WHERE location = 'Canada'
ORDER BY date DESC;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


iso_code,continent,location,date,new_tests,total_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,new_vaccinations_smoothed_per_million,stringency_index,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality
CAN,North America,Canada,2021-08-15,,,,,,,,,,51500860.0,27396288.0,24104572.0,98586.0,140174.0,136.45,72.59,63.87,3714.0,,4.037,41.4,16.984,10.797,44017.591,0.5,105.599,7.37,12.0,16.6,,2.5,82.43,0.929,
CAN,North America,Canada,2021-08-14,,,,,,,,,,51402274.0,27376948.0,24025326.0,111290.0,139950.0,136.19,72.54,63.66,3708.0,,4.037,41.4,16.984,10.797,44017.591,0.5,105.599,7.37,12.0,16.6,,2.5,82.43,0.929,
CAN,North America,Canada,2021-08-13,,,,,,,,,,51290984.0,27356765.0,23934219.0,148440.0,141949.0,135.9,72.48,63.42,3761.0,60.65,4.037,41.4,16.984,10.797,44017.591,0.5,105.599,7.37,12.0,16.6,,2.5,82.43,0.929,
CAN,North America,Canada,2021-08-12,73972.0,39199157.0,1038.604,1.96,61612.0,1.632,0.026,38.0,tests performed,51142544.0,27321586.0,23820958.0,153735.0,145900.0,135.51,72.39,63.11,3866.0,60.65,4.037,41.4,16.984,10.797,44017.591,0.5,105.599,7.37,12.0,16.6,,2.5,82.43,0.929,
CAN,North America,Canada,2021-08-11,55134.0,39125185.0,1036.644,1.461,58793.0,1.558,0.026,38.3,tests performed,50988809.0,27288715.0,23700094.0,138442.0,151146.0,135.1,72.3,62.79,4005.0,60.65,4.037,41.4,16.984,10.797,44017.591,0.5,105.599,7.37,12.0,16.6,,2.5,82.43,0.929,


The first few columns are the same ones found in the CovidDeaths table: iso_code, continent, location and date. Then we have data about the number of tests performed to detect COVID-19, as well as numbers related to vaccination, which include total number of vaccinations, people vaccinated, people fully vaccinated, etc. There are also several other columns describing risk factors for the gravity of the disease, like percentage of the population who is 65 or older, diabetes prevalence, number of hospital beds per thousand people, etc.

With regards to this table, we'll focus on number of vaccinations worldwide and per country.

## 3. Exploring the data

__GLOBAL NUMBERS: CASES AND DEATHS__

Before we can take a look at the global number of cases and deaths due to COVID-19, let's take a look at a way to filter the data considering we want the data for the entire world. 

In the location column, when the continent column is NULL, there are also data points for the continents or group of countries and the world. So we'll first see what are the names of those locations to select the correct one for the global numbers.

In [5]:
%%sql
SELECT DISTINCT location
FROM CovidDeaths
WHERE continent is null;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location
Africa
Asia
Europe
European Union
International
North America
Oceania
South America
World


Now we know we can use the location "World" as our filter. 

We'll query the numbers for total cases, total deaths and the percentage of deaths per case, which we'll call the DeathPercentage.

In [6]:
%%sql
SELECT
    location, 
    date, 
    total_cases, 
    total_deaths, 
    ROUND(total_deaths / total_cases, 2) AS DeathPercentage
FROM CovidDeaths
WHERE location ='World';

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,date,total_cases,total_deaths,DeathPercentage
World,2020-01-22,557.0,17.0,0.03
World,2020-01-23,655.0,18.0,0.03
World,2020-01-24,941.0,26.0,0.03
World,2020-01-25,1433.0,42.0,0.03
World,2020-01-26,2118.0,56.0,0.03
World,2020-01-27,2927.0,82.0,0.03
World,2020-01-28,5578.0,131.0,0.02
World,2020-01-29,6167.0,133.0,0.02
World,2020-01-30,8235.0,171.0,0.02
World,2020-01-31,9927.0,213.0,0.02


Let's also quickly see what were the minimum and maximum values for the DeathPercentage column, to have a better understanding of the pattern followed over time.

In [7]:
%%sql
WITH DeathPer AS
(
SELECT 
    location, 
    date, 
    total_cases, 
    total_deaths, 
    total_deaths / total_cases AS DeathPercentage
FROM CovidDeaths
WHERE location ='World'
)
SELECT 
    location, 
    date, 
    total_cases, 
    total_deaths, 
    ROUND(DeathPercentage, 2) DeathPercentage
FROM DeathPer
WHERE (
    DeathPercentage = (
        SELECT
            MAX(DeathPercentage)
        FROM DeathPer)
    OR 
    DeathPercentage = (
        SELECT
            MIN(DeathPercentage)
        FROM DeathPer)
)
ORDER BY date
;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,date,total_cases,total_deaths,DeathPercentage
World,2020-02-05,27643.0,564.0,0.02
World,2020-04-29,3198651.0,235053.0,0.07


So, now we see that the minimum value for the DeathPercentage calculated column happened in the beginning of the pandemic, in February 5, 2020. While not much time afterwards, about two and half months later, we saw the peak of the DeathPercentage in April 29, 2020.

__GLOBAL NUMBERS: ROLLING MEAN OF NEW CASES AND NEW DEATHS__

Next we'll calculate the rolling mean for new cases and new deaths considering a 14-day window for later use in a visualization. The rolling mean is useful to not take into account the daily fluctuations when trying to analyze if the number of new cases and deaths are increasing or decreasing. It is used to help [smooth out short-term fluctuations](https://en.wikipedia.org/wiki/Moving_average).

In [8]:
%%sql
DROP VIEW IF EXISTS RollingMean

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


[]

In [9]:
%%sql
CREATE VIEW RollingMean AS
SELECT
    location, 
    date, 
    ROUND(AVG(new_cases) OVER (
        ORDER BY date ROWS BETWEEN 14 PRECEDING AND CURRENT ROW), 0) AS RollingNewCases, 
    ROUND(AVG(new_deaths) OVER (
        ORDER BY date ROWS BETWEEN 14 PRECEDING AND CURRENT ROW), 0) AS RollingNewDeaths
FROM CovidDeaths
WHERE location = 'World';

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


[]

In [10]:
%%sql
SELECT *
FROM RollingMean;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,date,RollingNewCases,RollingNewDeaths
World,2020-01-22,0.0,0.0
World,2020-01-23,49.0,1.0
World,2020-01-24,128.0,3.0
World,2020-01-25,219.0,6.0
World,2020-01-26,312.0,8.0
World,2020-01-27,395.0,11.0
World,2020-01-28,717.0,16.0
World,2020-01-29,701.0,15.0
World,2020-01-30,853.0,17.0
World,2020-01-31,937.0,20.0


We could have done the same thing above using a TEMP TABLE instead of a VIEW, which we'll show below:

In [11]:
%%sql
DROP TABLE IF exists RollingMeanTable

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


[]

In [12]:
%%sql
CREATE TABLE RollingMeanTable
(
    location varchar(50), 
    date date, 
    RollingNewCases float, 
    RollingNewDeaths float
)
INSERT INTO RollingMeanTable
SELECT
    location, 
    date, 
    ROUND(AVG(new_cases) OVER (
        ORDER BY date ROWS BETWEEN 14 PRECEDING AND CURRENT ROW), 0) AS RollingNewCases, 
    ROUND(AVG(new_deaths) OVER (
        ORDER BY date ROWS BETWEEN 14 PRECEDING AND CURRENT ROW), 0) AS RollingNewDeaths
FROM CovidDeaths
WHERE location = 'World';

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
572 rows affected.


[]

In [13]:
%%sql
SELECT *
FROM RollingMeanTable;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,date,RollingNewCases,RollingNewDeaths
World,2020-01-22,0.0,0.0
World,2020-01-23,49.0,1.0
World,2020-01-24,128.0,3.0
World,2020-01-25,219.0,6.0
World,2020-01-26,312.0,8.0
World,2020-01-27,395.0,11.0
World,2020-01-28,717.0,16.0
World,2020-01-29,701.0,15.0
World,2020-01-30,853.0,17.0
World,2020-01-31,937.0,20.0


__ANALYSIS AT COUNTRY LEVEL: CASES AND DEATHS__

Next we'll try to take a look at a few indicators per country and see what kinds of findings we obtain. 

First, we'll look at the countries which had the highest number of cases in relation to their population.

In [14]:
%%sql
SELECT TOP 10
    location, 
    population, 
    MAX(total_cases) AS HighestInfectionCount, 
    ROUND((MAX(total_cases) / population) * 100, 2) AS CasesPopPercentage
FROM CovidDeaths
WHERE continent is not NULL
GROUP BY location, population
ORDER BY CasesPopPercentage DESC;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,population,HighestInfectionCount,CasesPopPercentage
Andorra,77265.0,14924.0,19.32
Seychelles,98340.0,18895.0,19.21
Montenegro,628062.0,106196.0,16.91
Bahrain,1701583.0,270919.0,15.92
Czechia,10708982.0,1676222.0,15.65
San Marino,33938.0,5194.0,15.3
Maldives,540542.0,79137.0,14.64
Slovenia,2078932.0,261428.0,12.58
Cyprus,888005.0,108872.0,12.26
Georgia,3989175.0,481578.0,12.07


We see that the highest number of cases in regards to the population is around 19% in the small country of Andorra. The countries that come next aren't the most populous countries either, which can be explained by the fact that it's easier to obtain highest percentages in small samples.

Now, let's take a look at countries which have the highest number of deaths due to COVID-19 and the percentage of deaths regarding the number of cases.

In [15]:
%%sql
SELECT TOP 10
    location, 
    population, 
    MAX(total_deaths) AS HighestDeathCount, 
    ROUND((MAX(total_deaths) / MAX(total_cases)) * 100, 2) AS DeathsCasesPercentage
FROM CovidDeaths
WHERE continent is not NULL
GROUP BY location, population
ORDER BY HighestDeathCount DESC;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,population,HighestDeathCount,DeathsCasesPercentage
United States,331002647.0,621635.0,1.69
Brazil,212559409.0,569058.0,2.79
India,1380004385.0,431642.0,1.34
Mexico,128932753.0,248167.0,8.03
Peru,32971846.0,197393.0,9.25
Russia,145934460.0,167595.0,2.57
United Kingdom,67886004.0,131269.0,2.08
Italy,60461828.0,128432.0,2.89
Colombia,50882884.0,123459.0,2.54
Indonesia,273523621.0,117588.0,3.05


Here we see that most populous countries figure in the top of the list, with the highest absolute number of deaths due to COVID-19. However, when we look at the percentage of deaths per case, there are a few differences among the top 10. For example, the United States is the one with the highest number of deaths, but one of the lowest percentages of deaths per case, about 1.69%. Whereas Mexico and Peru come in 4th and 5th in the total of deaths, but, considering the percentage, they would be at the top, with 8.03% and 9.25%.

Some of these differences may be due to a number of reasons: conditions and capacity of the health care system, subnotification of cases, etc.

In case we wanted to take a closer look at the number of total cases and deaths in the United States, we could query as below:

In [16]:
%%sql
SELECT 
    location, 
    date, 
    total_cases, 
    total_deaths, 
    ROUND((total_deaths / total_cases) * 100, 2) AS DeathPercentage
FROM CovidDeaths
WHERE location like '%states%'
ORDER BY location, date;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,date,total_cases,total_deaths,DeathPercentage
United States,2020-01-22,1.0,,
United States,2020-01-23,1.0,,
United States,2020-01-24,2.0,,
United States,2020-01-25,2.0,,
United States,2020-01-26,5.0,,
United States,2020-01-27,5.0,,
United States,2020-01-28,5.0,,
United States,2020-01-29,6.0,,
United States,2020-01-30,6.0,,
United States,2020-01-31,8.0,,


__GLOBAL NUMBERS: VACCINATION__

To start the analysis on the vaccination data, we'll first join the two base tables from our database: CovidDeaths and CovidVaccinations.

In [17]:
%%sql
SELECT TOP 5 *
FROM CovidDeaths dea
JOIN CovidVaccinations vac
    ON dea.location = vac.location
    AND dea.date = vac.date;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


iso_code,continent,location,date,population,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million,iso_code_1,continent_1,location_1,date_1,new_tests,total_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,new_vaccinations_smoothed_per_million,stringency_index,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality
BHS,North America,Bahamas,2020-08-02,393248.0,648.0,49.0,43.714,14.0,0.0,0.429,1647.815,124.603,111.162,35.601,0.0,1.09,1.49,,,,,,,,,BHS,North America,Bahamas,2020-08-02,,,,,,,,,,,,,,,,,,,81.94,39.497,34.3,8.996,5.2,27717.847,,235.954,13.17,3.1,20.4,,2.9,73.92,0.814,
BHS,North America,Bahamas,2020-08-03,393248.0,679.0,31.0,42.429,14.0,0.0,0.429,1726.646,78.831,107.893,35.601,0.0,1.09,1.46,,,,,,,,,BHS,North America,Bahamas,2020-08-03,,,,,,,,,,,,,,,,,,,81.94,39.497,34.3,8.996,5.2,27717.847,,235.954,13.17,3.1,20.4,,2.9,73.92,0.814,
BHS,North America,Bahamas,2020-08-04,393248.0,715.0,36.0,38.286,14.0,0.0,0.429,1818.191,91.545,97.358,35.601,0.0,1.09,1.43,,,,,,,,,BHS,North America,Bahamas,2020-08-04,,,,,,,,,,,,,,,,,,,90.74,39.497,34.3,8.996,5.2,27717.847,,235.954,13.17,3.1,20.4,,2.9,73.92,0.814,
BHS,North America,Bahamas,2020-08-05,393248.0,751.0,36.0,38.143,14.0,0.0,0.429,1909.736,91.545,96.994,35.601,0.0,1.09,1.41,,,,,,,,,BHS,North America,Bahamas,2020-08-05,,,,,,,,,,,,,,,,,,,90.74,39.497,34.3,8.996,5.2,27717.847,,235.954,13.17,3.1,20.4,,2.9,73.92,0.814,
BHS,North America,Bahamas,2020-08-06,393248.0,761.0,10.0,36.143,14.0,0.0,0.0,1935.166,25.429,91.909,35.601,0.0,0.0,1.39,,,,,,,,,BHS,North America,Bahamas,2020-08-06,,,,,,,,,,,,,,,,,,,90.74,39.497,34.3,8.996,5.2,27717.847,,235.954,13.17,3.1,20.4,,2.9,73.92,0.814,


Now we'll see the rate of vaccination in the world since the first person vaccinated. In order to filter starting that day, we have to first find out when that was.

In [18]:
%%sql
SELECT TOP 5
    location, 
    date
FROM CovidVaccinations
WHERE total_vaccinations is not NUll
ORDER BY date ASC;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,date
World,2020-12-02
Norway,2020-12-02
Europe,2020-12-02
World,2020-12-03
Norway,2020-12-03


We now know that vaccination started on December 2nd, 2020. 

Let's move on to join again the two tables and look at the rate of people vaccinated with at least one dose of the vaccines in the world. There is a specific column describing the number of people fully vaccinated against the virus, since some vaccines require two doses for the person to be considered fully vaccinated.

In [19]:
%%sql
SELECT
    vac.location, 
    vac.date, 
    vac.people_vaccinated, 
    dea.population, 
    ROUND((vac.people_vaccinated / dea.population) * 100, 2) AS PopVaccinatedPercentage
FROM CovidVaccinations vac
JOIN CovidDeaths dea
    ON vac.location = dea.location
    AND vac.date = dea.date
WHERE vac.location = 'World'
AND vac.date > '2020-12-01';

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,date,people_vaccinated,population,PopVaccinatedPercentage
World,2020-12-02,0.0,7794798729.0,0.0
World,2020-12-03,0.0,7794798729.0,0.0
World,2020-12-04,1.0,7794798729.0,0.0
World,2020-12-05,1.0,7794798729.0,0.0
World,2020-12-06,1.0,7794798729.0,0.0
World,2020-12-07,2.0,7794798729.0,0.0
World,2020-12-08,2.0,7794798729.0,0.0
World,2020-12-09,2.0,7794798729.0,0.0
World,2020-12-10,2.0,7794798729.0,0.0
World,2020-12-11,2.0,7794798729.0,0.0


So far (August 15, 2021), about 31% of the whole population in the planet has been vaccinated with at least one dose.

__ANALYSIS PER COUNTRY (TOP 10 MOST POPULOUS COUNTRIES): VACCINATION__

And what about the most populous countries? What is the percentage of the population who is vaccinated and who is fully vaccinated (in case the vaccines used require two doses)?

In [20]:
%%sql
SELECT TOP 10
    vac.location, 
    dea.population,
    MAX(vac.people_vaccinated) PeopleVaccinated, 
    ROUND((MAX(vac.people_vaccinated) / dea.population) * 100, 2) AS PopVaccinatedPercentage, 
    MAX(vac.people_fully_vaccinated) PeopleFullyVaccinated, 
    ROUND((MAX(vac.people_fully_vaccinated) / dea.population) * 100, 2) AS PopFullyVaccinatedPercentage
FROM CovidVaccinations vac
JOIN CovidDeaths dea
    ON vac.location = dea.location
    AND vac.date = dea.date
WHERE vac.continent is not NULL
GROUP BY vac.location, dea.population
ORDER BY dea.population DESC, PopFullyVaccinatedPercentage DESC
;

 * mssql://@CLA\SQLEXPRESS/PortfolioProject1?driver=SQL+Server
Done.


location,population,PeopleVaccinated,PopVaccinatedPercentage,PeopleFullyVaccinated,PopFullyVaccinatedPercentage
China,1439323774.0,622000000.0,43.21,777046000.0,53.99
India,1380004385.0,422575401.0,30.62,121270889.0,8.79
United States,331002647.0,198088722.0,59.85,168362058.0,50.86
Indonesia,273523621.0,53573831.0,19.59,28037059.0,10.25
Pakistan,220892331.0,34343317.0,15.55,12104373.0,5.48
Brazil,212559409.0,119793423.0,56.36,49629214.0,23.35
Nigeria,206139587.0,2550390.0,1.24,1416623.0,0.69
Bangladesh,164689383.0,15567318.0,9.45,5351022.0,3.25
Russia,145934460.0,40731280.0,27.91,32432870.0,22.22
Mexico,128932753.0,54305039.0,42.12,29239686.0,22.68


We see quite quite a few differences considering either the percentage of the population who has been vaccinated or fully vaccinated. 

China is the most populous country in the world, with 43% of the population with at least one dose of the vaccines, while the United States has almost 60% of the population vaccinated. Brazil doesn't seem so far, since about 53% of the population has been vaccinated with at least one dose, but only 23% are fully vaccinated.

As for the lowest numbers in the top 10 most populous countries, Bangladesh and Nigeria are below the 10% mark in both of the percentages.