Dan Thompson, Xi Chen, Alvaro Romero, Mahah Sadique, Guangbin Yu and Marie Dujardin
In our final group project for the Data Visualization course, we aim to explore, analyze, and tell a compelling story using various datasets related to COVID-19 in South Korea. Our goal is to create an original and informative project that sheds light on the pandemic's impact in the region.
Some research questions that we will address are for example:
-
How did government policies impact the spread and control of COVID-19 in South Korea? We aim to visualize policy changes over time and correlate them with infection rates.
-
What were the major infection sources and clusters, and how did they contribute to the pandemic's spread?
-
What was the public's awareness and interest in COVID-19 during different phases of the pandemic? We will create a keyword trend analysis to understand the public sentiment.
-
How did demographic factors like age and gender influence the pandemic's impact? We will analyze patient data to answer this question.
-
What was the mobility pattern of the population, and how did it correlate with infection rates? We will visualize floating population data alongside infection statistics.
We plan to leverage the datasets available at https://www.kaggle.com/datasets/kimjihoo/coronavirusdataset for this purpose. There are 11 datasets available that we plan on using.
In addition to data visualization techniques, our project will involve preprocessing steps, including merging the different datasets. This integration will enable us to explore relationships between variables effectively.
Furthermore, we plan to employ regression analysis, for example to explain the factors influencing the incidence of COVID-19 in each region. We can do this by using the dataset "COVID-19 Infection Cases Data", which has information about the infection cases in South Korea, including the province and city. We will merge this dataset with the "Location and Statistical Data" , which has information on educational institutions, elderly population ratios, nursing home counts, and more. Using statistical regression models, we will gain a deeper understanding of the pandemic's impact on various regions in South Korea.