# Covid-19 Cases in Thailand

- Write a program to read a csv file of covid-19 cases in Thailand from 12 August 2021.

## Steps

1. Read file 'confirmed-cases-since-120864.csv' to read a data.
2. Data should be transformed into the preferred form structure.
3. Processing the data

In [1]:
# A function to read data from file to a list

def read_data(file_name):
    with open(file_name, mode = 'r', encoding = 'utf-8-sig') as f:
        data = f.readlines()
        
    return data

In [2]:
# A function to transform data to the designated data structure for our task

def transform_data(data, keys):
    cases = list()
    
    for item in data:
        case = dict()
        record = item.strip().split(',')
        
        for i in range(len(record)):
            case[keys[i]] = record[i]
        
        cases.append(case)
        
    return cases

In [3]:
# A function that returns number of cases for a given date.

def number_of_cases(cases, date):
    count = 0
    for case in cases:
        if 'announce_date' in case.keys() and case['announce_date'] == date:
            count += 1
            
    return count

In [4]:
# A function that returns the number of males and females for a range of dates.

def number_of_cases_by_sex(cases, start_date, end_date):
    males = 0
    females = 0
    unknown = 0
    
    for case in cases:
        if 'announce_date' in case.keys() and start_date <= case['announce_date'] <= end_date:
            if case['sex'] == 'ชาย': 
                males += 1
            elif case['sex'] == 'หญิง': 
                females += 1
            else: 
                unknown += 1
            
    return males, females, unknown

In [5]:
# A function that returns the distribution of ages. Given the age ranges consist of [(0,19), (20,39), (40,59), (60,79), (80,99), (100, 119)]

def number_of_cases_by_ages(cases, age_range):
    # bin = [0, 0, 0, 0, 0, 0, 0]
    bin = [0] * (len(age_range) + 1) 
    
    for case in cases:
        if 'Unit' in case.keys():
            if case['Unit'] == 'เดือน' or case['Unit'] == 'วัน':
                bin[0] += 1
            elif case['Unit'] == 'ปี':
                for i in range(len(bin) - 1):
                    if age_range[i][0] <= float(case['age']) <= age_range[i][1]:
                        bin[i] +=1
                        break
            else:
                bin[-1] += 1
                
    return bin

### Main program

In [6]:
data = read_data('confirmed-cases-since-120864.csv')

In [7]:
print(len(data))
print(data[0], end = '')
print(data[1], end = '')

1042184
No.,announce_date,Notified date,sex,age,Unit,nationality,province_of_isolation,risk,province_of_onset,district_of_onset
816990,12/8/2021,11/8/2021,ชาย,7,ปี,Thailand,เชียงราย,อื่นๆ,,


In [8]:
keys = data.pop(0).strip().split(',')
print(keys)

['No.', 'announce_date', 'Notified date', 'sex', 'age', 'Unit', 'nationality', 'province_of_isolation', 'risk', 'province_of_onset', 'district_of_onset']


In [9]:
cases = transform_data(data, keys)
print(cases[102])

{'No.': '817092', 'announce_date': '12/8/2021', 'Notified date': '11/8/2021', 'sex': 'ชาย', 'age': '34', 'Unit': 'ปี', 'nationality': 'Thailand', 'province_of_isolation': 'น่าน', 'risk': 'อื่นๆ', 'province_of_onset': '', 'district_of_onset': ''}


In [10]:
# number of cases for a given date.

print(number_of_cases(cases, '14/8/2021'))

22086


In [11]:
# number of males and females for a range of dates.

print(number_of_cases_by_sex(cases, '14/8/2021', '16/8/2021'))

(51589, 55674, 4580)


In [12]:
print(number_of_cases_by_ages(cases, [(0,19), (20,39), (40,59), (60,79), (80,99), (100, 119)]))

[191634, 396090, 272375, 88497, 13487, 85, 80000]
