## **Association mining to find hotspots based on a Patient Route Data**
<author> &copy; Prepared by Oscar Mendoza Cerna

In [1]:
# Import packages
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from apyori import apriori
import warnings
warnings.filterwarnings("ignore")
from collections import defaultdict
import subprocess
import re

The dataset contains route information of 911 COVID-19 positive patients. Two variables, patient_id and global_num, specify a unique identifier for each patient; one variable, date, presents the date of a patient’s visit; three variables, location and latitude combined with longitude, define the geographic location of a patient’s visit.

In [2]:
# load the dataset
df = pd.read_csv('D1.csv')

In [3]:
# info and the first 10 transactions
print(df.info())
df.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1544 entries, 0 to 1543
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   patient_id  1544 non-null   object 
 1   global_num  975 non-null    float64
 2   date        1544 non-null   object 
 3   location    1544 non-null   object 
 4   latitude    1544 non-null   float64
 5   longitude   1544 non-null   float64
dtypes: float64(3), object(3)
memory usage: 72.5+ KB
None


Unnamed: 0,patient_id,global_num,date,location,latitude,longitude
0,P1000000501,2.0,22/04/2020,Chittorgarh_Rajasthan,24.879999,74.629997
1,P1000000501,2.0,24/04/2020,Ratnagiri_Maharashtra,16.994444,73.300003
2,P1000000502,5.0,26/04/2020,Pindwara_Rajasthan,24.7945,73.055
3,P1000000502,5.0,27/04/2020,Raipur_Chhattisgarh,21.25,81.629997
4,P1000000502,5.0,28/04/2020,Gokak_Karnataka,16.1667,74.833298
5,P1000000504,7.0,30/04/2020,Lucknow_Uttar Pradesh,26.85,80.949997
6,P1000000505,9.0,30/04/2020,Lucknow_Uttar Pradesh,26.85,80.949997
7,P1000000506,10.0,30/04/2020,Delhi_Delhi,28.679079,77.06971
8,P1000000507,11.0,30/04/2020,Delhi_Delhi,28.679079,77.06971
9,P1000000508,13.0,30/04/2020,Ratnagiri_Maharashtra,16.994444,73.300003


In [5]:
# There are 911 patients with covid positive
df['patient_id'].describe()

count            1544
unique            911
top       P3013000501
freq                6
Name: patient_id, dtype: object

In [6]:
# There are 151 locations
df['location'].describe()

count                       1544
unique                       151
top       Sardarshahar_Rajasthan
freq                         134
Name: location, dtype: object

For pre-processing, firstly the data type of date was changed from object into datetime64 in the case that sequence analysis is required in the future.
The dataset consists of 1544 rows/observations, indicating a single patient can have multiple rows as he/she might have travelled to multiple locations in their visit(s). Therefore, to identify their common routes using association mining, patient_id was grouped to generate a list of all location travelled

In [9]:
# Pre-processing: change datatype for sequence analysis later
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df[['date']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1544 entries, 0 to 1543
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    1544 non-null   datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 12.2 KB


In [10]:
# Pre-processing: generating transactional data for association mining
# group by patient_id and list all location
transactions = df.groupby(['patient_id'])['location'].apply(list)
print(str(transactions.count()) + " transactions")
transactions.head(3)

911 transactions


patient_id
P1000000501       [Chittorgarh_Rajasthan, Ratnagiri_Maharashtra]
P1000000502    [Pindwara_Rajasthan, Raipur_Chhattisgarh, Goka...
P1000000504                              [Lucknow_Uttar Pradesh]
Name: location, dtype: object

In the association analysis, variable patient_id, instead of global_num, was included as the transaction identification, because the latter had 569 missing values. Variable location, rather than latitude and longitude, was included as item of transaction, because the former can define a geographic location independently.

### Determine min_support and min_confidence

In [12]:
# type cast the transactions from pandas into normal list format and run apriori
transaction_list = list(transactions)

# test min_support (0.010 to 0.001) and determine min_confidence
results_test = list(apriori(transaction_list, min_support=0.004))

# print first 5 rules
print(results_test[:5])

[RelationRecord(items=frozenset({'Alipurduar_West Bengal'}), support=0.026344676180021953, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Alipurduar_West Bengal'}), confidence=0.026344676180021953, lift=1.0)]), RelationRecord(items=frozenset({'Amalner_Maharashtra'}), support=0.0043907793633369925, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Amalner_Maharashtra'}), confidence=0.0043907793633369925, lift=1.0)]), RelationRecord(items=frozenset({'Ambernath_Maharashtra'}), support=0.006586169045005488, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Ambernath_Maharashtra'}), confidence=0.006586169045005488, lift=1.0)]), RelationRecord(items=frozenset({'Anand_Gujarat'}), support=0.01646542261251372, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Anand_Gujarat'}), confidence=0.01646542261251372, lift=1.0)]), RelationRecord(items=frozenset({'Anantnag

In [13]:
# Change output format
def convert_apriori_results_to_pandas_df(results_test):
    rules = []
    
    for rule_set in results_test:
        for rule in rule_set.ordered_statistics:
            # items_base = left side of rules, items_add = right side
            # support, confidence and lift for respective rules
            rules.append([','.join(rule.items_base), ','.join(rule.items_add),
                         rule_set.support, rule.confidence, rule.lift]) 
    
    # typecast it to pandas df
    return pd.DataFrame(rules, columns=['Left_side', 'Right_side', 'Support', 
                                        'Confidence', 'Lift']) 

result_df = convert_apriori_results_to_pandas_df(results_test)

print("Number of rules acquired", len(result_df))
print("")
print(result_df.head(5))

Number of rules acquired 211

  Left_side                  Right_side   Support  Confidence  Lift
0                Alipurduar_West Bengal  0.026345    0.026345   1.0
1                   Amalner_Maharashtra  0.004391    0.004391   1.0
2                 Ambernath_Maharashtra  0.006586    0.006586   1.0
3                         Anand_Gujarat  0.016465    0.016465   1.0
4            Anantnag_Jammu and Kashmir  0.009879    0.009879   1.0


In [15]:
# sort all acquired rules descending by Confidence
result_df1 = result_df.sort_values(by='Confidence', ascending=False)
result_df1.head(100)

Unnamed: 0,Left_side,Right_side,Support,Confidence,Lift
209,"Panaji_Goa,Shivpuri_Madhya Pradesh",Raiganj_West Bengal,0.004391,1.000000,29.387097
208,"Raiganj_West Bengal,Panaji_Goa",Shivpuri_Madhya Pradesh,0.004391,1.000000,70.076923
98,Gondal_Gujarat,Chalakudy_Kerala,0.007684,0.700000,12.503922
182,Shivpuri_Madhya Pradesh,Raiganj_West Bengal,0.008782,0.615385,18.084367
95,Chirala_Andhra Pradesh,Chalakudy_Kerala,0.004391,0.571429,10.207283
...,...,...,...,...,...
59,,Raiganj_West Bengal,0.034029,0.034029,1.000000
73,,Sirohi_Rajasthan,0.030735,0.030735,1.000000
140,Sardarshahar_Rajasthan,Karaikal_Puducherry,0.004391,0.029851,1.133085
126,,"Gokak_Karnataka,Sardarshahar_Rajasthan",0.027442,0.027442,1.000000


In [11]:
# sort all acquired rules descending by lift and output top-5 rules
result_df = result_df.sort_values(by='Lift', ascending=False)
result_df.head(5)

Unnamed: 0,Left_side,Right_side,Support,Confidence,Lift
208,"Raiganj_West Bengal,Panaji_Goa",Shivpuri_Madhya Pradesh,0.004391,1.0,70.076923
207,Shivpuri_Madhya Pradesh,"Panaji_Goa,Raiganj_West Bengal",0.004391,0.307692,70.076923
210,"Shivpuri_Madhya Pradesh,Raiganj_West Bengal",Panaji_Goa,0.004391,0.5,56.9375
205,Panaji_Goa,"Shivpuri_Madhya Pradesh,Raiganj_West Bengal",0.004391,0.5,56.9375
178,Panaji_Goa,Shivpuri_Madhya Pradesh,0.004391,0.5,35.038462


In [16]:
# Output ‘min_support’ and 'min_confidence'
min_support = result_df['Support'].min()*100
# When mean of the confidence is selected 
min_confidence = result_df['Confidence'].mean()*100 

print('The min_support is ' + '{:.4f}%'.format(min_support)) 
print('The min_confidence is ' + '{:.4f}%'.format(min_confidence) + ', if determined by mean of the confidence')    

The min_support is 0.4391%
The min_confidence is 9.9266%, if determined by mean of the confidence


In [17]:
# When median of the confidence is selected 
min_confidence = result_df['Confidence'].median()*100 
print('The min_support is ' + '{:.4f}%'.format(min_support)) 
print('The min_confidence is ' + '{:.4f}%'.format(min_confidence) + ', if determined by median of the confidence')  

The min_support is 0.4391%
The min_confidence is 2.0856%, if determined by median of the confidence


### min_support 0.4% and min_confidence 10% used in association analysis

In [18]:
# apply the Apriori algorithm on the dataset
results = list(apriori(transaction_list, min_support=0.004, min_confidence=0.1))

# print first 5 rules
print(results_test[:5])

[RelationRecord(items=frozenset({'Alipurduar_West Bengal'}), support=0.026344676180021953, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Alipurduar_West Bengal'}), confidence=0.026344676180021953, lift=1.0)]), RelationRecord(items=frozenset({'Amalner_Maharashtra'}), support=0.0043907793633369925, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Amalner_Maharashtra'}), confidence=0.0043907793633369925, lift=1.0)]), RelationRecord(items=frozenset({'Ambernath_Maharashtra'}), support=0.006586169045005488, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Ambernath_Maharashtra'}), confidence=0.006586169045005488, lift=1.0)]), RelationRecord(items=frozenset({'Anand_Gujarat'}), support=0.01646542261251372, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Anand_Gujarat'}), confidence=0.01646542261251372, lift=1.0)]), RelationRecord(items=frozenset({'Anantnag

In [19]:
# Change output format
def convert_apriori_results_to_pandas_df(results):
    rules = []
    
    for rule_set in results:
        for rule in rule_set.ordered_statistics:
            # items_base = left side of rules, items_add = right side
            # support, confidence and lift for respective rules
            rules.append([','.join(rule.items_base), ','.join(rule.items_add),
                         rule_set.support, rule.confidence, rule.lift]) 
    
    # typecast it to pandas df
    return pd.DataFrame(rules, columns=['Left_side', 'Right_side', 'Support', 
                                        'Confidence', 'Lift']) 

result_df = convert_apriori_results_to_pandas_df(results)

print("Number of rules acquired", len(result_df))
print("")
print(result_df.head(5))

Number of rules acquired 54

                   Left_side                 Right_side   Support  Confidence  \
0                                Sardarshahar_Rajasthan  0.147091    0.147091   
1     Alipurduar_West Bengal       Ranebennur_Karnataka  0.004391    0.166667   
2                 Barh_Bihar       Ranebennur_Karnataka  0.004391    0.363636   
3  Bhimavaram_Andhra Pradesh           Chalakudy_Kerala  0.006586    0.545455   
4           Chalakudy_Kerala  Bhimavaram_Andhra Pradesh  0.006586    0.117647   

       Lift  
0  1.000000  
1  1.921941  
2  4.193326  
3  9.743316  
4  9.743316  


#### Top-5 rules  (0.4%, 10%)

In [20]:
# sort all acquired rules descending by lift and output top-5 rules
result_df = result_df.sort_values(by='Lift', ascending=False)
result_df.head(20)

Unnamed: 0,Left_side,Right_side,Support,Confidence,Lift
51,"Raiganj_West Bengal,Panaji_Goa",Shivpuri_Madhya Pradesh,0.004391,1.0,70.076923
50,Shivpuri_Madhya Pradesh,"Raiganj_West Bengal,Panaji_Goa",0.004391,0.307692,70.076923
48,Panaji_Goa,"Raiganj_West Bengal,Shivpuri_Madhya Pradesh",0.004391,0.5,56.9375
53,"Raiganj_West Bengal,Shivpuri_Madhya Pradesh",Panaji_Goa,0.004391,0.5,56.9375
37,Shivpuri_Madhya Pradesh,Panaji_Goa,0.004391,0.307692,35.038462
36,Panaji_Goa,Shivpuri_Madhya Pradesh,0.004391,0.5,35.038462
49,Raiganj_West Bengal,"Panaji_Goa,Shivpuri_Madhya Pradesh",0.004391,0.129032,29.387097
52,"Panaji_Goa,Shivpuri_Madhya Pradesh",Raiganj_West Bengal,0.004391,1.0,29.387097
41,Vasco da Gama_Goa,Ranaghat_West Bengal,0.005488,0.384615,25.027473
40,Ranaghat_West Bengal,Vasco da Gama_Goa,0.005488,0.357143,25.027473


#### Top-5 common routes travelled from the town Chalakudy in Kerala state (0.4%, 10%)

In [21]:
# min_support 0.4% and min_confidence 10%
route_from_Chalakudy = result_df.loc[result_df.Left_side == 'Chalakudy_Kerala']
route_from_Chalakudy[:5]

Unnamed: 0,Left_side,Right_side,Support,Confidence,Lift
6,Chalakudy_Kerala,Gondal_Gujarat,0.007684,0.137255,12.503922
4,Chalakudy_Kerala,Bhimavaram_Andhra Pradesh,0.006586,0.117647,9.743316


### min_support 0.4% and min_confidence 5% used in association analysis

In [22]:
# apply the Apriori algorithm on the dataset
results = list(apriori(transaction_list, min_support=0.004, min_confidence=0.05))

# print first 5 rules
print(results_test[:5])

[RelationRecord(items=frozenset({'Alipurduar_West Bengal'}), support=0.026344676180021953, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Alipurduar_West Bengal'}), confidence=0.026344676180021953, lift=1.0)]), RelationRecord(items=frozenset({'Amalner_Maharashtra'}), support=0.0043907793633369925, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Amalner_Maharashtra'}), confidence=0.0043907793633369925, lift=1.0)]), RelationRecord(items=frozenset({'Ambernath_Maharashtra'}), support=0.006586169045005488, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Ambernath_Maharashtra'}), confidence=0.006586169045005488, lift=1.0)]), RelationRecord(items=frozenset({'Anand_Gujarat'}), support=0.01646542261251372, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Anand_Gujarat'}), confidence=0.01646542261251372, lift=1.0)]), RelationRecord(items=frozenset({'Anantnag

In [23]:
# Change output format
def convert_apriori_results_to_pandas_df(results):
    rules = []
    
    for rule_set in results:
        for rule in rule_set.ordered_statistics:
            # items_base = left side of rules, items_add = right side
            # support, confidence and lift for respective rules
            rules.append([','.join(rule.items_base), ','.join(rule.items_add),
                         rule_set.support, rule.confidence, rule.lift]) 
    
    # typecast it to pandas df
    return pd.DataFrame(rules, columns=['Left_side', 'Right_side', 'Support', 
                                        'Confidence', 'Lift']) 

result_df = convert_apriori_results_to_pandas_df(results)

print("Number of association rules acquired", len(result_df))
print("")
print(result_df.head(5))

Number of association rules acquired 89

  Left_side             Right_side   Support  Confidence  Lift
0                 Chalakudy_Kerala  0.055982    0.055982   1.0
1            Channapatna_Karnataka  0.055982    0.055982   1.0
2                  Gokak_Karnataka  0.054885    0.054885   1.0
3                    Kollam_Kerala  0.059276    0.059276   1.0
4            Lucknow_Uttar Pradesh  0.086718    0.086718   1.0


#### Top-5 rules  (0.4%, 5%)

In [24]:
# sort all acquired rules descending by lift and output top-5 rules
result_df = result_df.sort_values(by='Lift', ascending=False)
result_df.head(5)

Unnamed: 0,Left_side,Right_side,Support,Confidence,Lift
86,"Raiganj_West Bengal,Panaji_Goa",Shivpuri_Madhya Pradesh,0.004391,1.0,70.076923
85,Shivpuri_Madhya Pradesh,"Raiganj_West Bengal,Panaji_Goa",0.004391,0.307692,70.076923
88,"Raiganj_West Bengal,Shivpuri_Madhya Pradesh",Panaji_Goa,0.004391,0.5,56.9375
83,Panaji_Goa,"Raiganj_West Bengal,Shivpuri_Madhya Pradesh",0.004391,0.5,56.9375
67,Panaji_Goa,Shivpuri_Madhya Pradesh,0.004391,0.5,35.038462


#### Top-5 common routes travelled from the town Chalakudy in Kerala state (0.4%, 5%)

In [25]:
# min_support 0.4% and min_confidence 5%

route_from_Chalakudy = result_df.loc[result_df.Left_side == 'Chalakudy_Kerala']
route_from_Chalakudy[:5]

Unnamed: 0,Left_side,Right_side,Support,Confidence,Lift
16,Chalakudy_Kerala,Gondal_Gujarat,0.007684,0.137255,12.503922
14,Chalakudy_Kerala,Chirala_Andhra Pradesh,0.004391,0.078431,10.207283
13,Chalakudy_Kerala,Bhimavaram_Andhra Pradesh,0.006586,0.117647,9.743316
21,Chalakudy_Kerala,Sinnar_Maharashtra,0.005488,0.098039,5.253749
18,Chalakudy_Kerala,Markapur_Andhra Pradesh,0.004391,0.078431,5.103641


### Sequence analysis
Sequence analysis can be performed on this dataset because of orders of the locations that these patients have travelled to can be identified by sorting the dates of their visits. Here, a new attribute trip was created to hold the information on the order. In the case that one patient has travelled to multiple locations on a single date, the order of the locations was determined according to their order in the original dataset. Then, sequential rules were generated using SPMF with min_support threshold of 0.9% and min_confidence threshold of 75%.

In [26]:
# sorting the date of visit
df = df.sort_values(['patient_id','date'])
# create an attribute to hold the info on the order of the target
df['trip'] = df.groupby('patient_id').cumcount()+1
df.head(10)

Unnamed: 0,patient_id,global_num,date,location,latitude,longitude,trip
0,P1000000501,2.0,2020-04-22,Chittorgarh_Rajasthan,24.879999,74.629997,1
1,P1000000501,2.0,2020-04-24,Ratnagiri_Maharashtra,16.994444,73.300003,2
2,P1000000502,5.0,2020-04-26,Pindwara_Rajasthan,24.7945,73.055,1
3,P1000000502,5.0,2020-04-27,Raipur_Chhattisgarh,21.25,81.629997,2
4,P1000000502,5.0,2020-04-28,Gokak_Karnataka,16.1667,74.833298,3
5,P1000000504,7.0,2020-04-30,Lucknow_Uttar Pradesh,26.85,80.949997,1
6,P1000000505,9.0,2020-04-30,Lucknow_Uttar Pradesh,26.85,80.949997,1
7,P1000000506,10.0,2020-04-30,Delhi_Delhi,28.679079,77.06971,1
8,P1000000507,11.0,2020-04-30,Delhi_Delhi,28.679079,77.06971,1
9,P1000000508,13.0,2020-04-30,Ratnagiri_Maharashtra,16.994444,73.300003,1


In [27]:
# produce sequences in order
trip = df.groupby(['patient_id'])['location'].apply(list)
sequences = trip.values.tolist()
print(sequences[:10])

[['Chittorgarh_Rajasthan', 'Ratnagiri_Maharashtra'], ['Pindwara_Rajasthan', 'Raipur_Chhattisgarh', 'Gokak_Karnataka'], ['Lucknow_Uttar Pradesh'], ['Lucknow_Uttar Pradesh'], ['Delhi_Delhi'], ['Delhi_Delhi'], ['Ratnagiri_Maharashtra'], ['Lucknow_Uttar Pradesh'], ['Mumbai_Maharashtra'], ['Ratnagiri_Maharashtra', 'Sagar_Karnataka']]


In [28]:
''' Uses SPMF to find association rules in supplied transactions '''
def get_association_rules(sequences, min_sup, min_conf):
    # step 1: create required input for SPMF
    
    # prepare a dict to uniquely assign each item in the transactions to an int ID
    item_dict = defaultdict(int)
    output_dict = defaultdict(str)
    item_id = 1
    
    # write your sequences in SPMF format
    with open('seq_rule_input.txt', 'w+') as f:
        for sequence in sequences:
            z = []
            for itemset in sequence:
                # if there are multiple items in one itemset
                if isinstance(itemset, list):
                    for item in itemset:
                        if item not in item_dict:
                            item_dict[item] = item_id
                            item_id += 1

                        z.append(item_dict[item])
                else:
                    if itemset not in item_dict:
                        item_dict[itemset] = item_id
                        output_dict[str(item_id)] = itemset
                        item_id += 1
                    z.append(item_dict[itemset])
                    
                # end of itemset
                z.append(-1)
            
            # end of a sequence
            z.append(-2)
            f.write(' '.join([str(x) for x in z]))
            f.write('\n')
    
    # run SPMF with supplied parameters
    supp_param = '{}%'.format(int(min_sup * 100))
    conf_param = '{}%'.format(int(min_conf * 100))
    subprocess.call(['java', '-jar', 'spmf.jar', 'run', 'RuleGrowth', 
                     'seq_rule_input.txt', 'seq_rule_output.txt', 
                     supp_param, conf_param], shell=True)
    
    # read back the output rules
    outputs = open('seq_rule_output.txt', 'r').read().strip().split('\n')
    output_rules = []
    for rule in outputs:
        left, right, sup, conf = re.search(pattern=r'([0-9\,]+) ==> ([0-9\,]+) #SUP: ([0-9]+) #CONF: ([0-9\.]+)', string=rule).groups()
        sup = int(sup) / len(sequences)
        conf = float(conf)
        output_rules.append([[output_dict[x] for x in left.split(',')], [output_dict[x] for x in right.split(',')], sup, conf])
    
    # return pandas DataFrame
    return pd.DataFrame(output_rules, columns = ['Left_rule', 'Right_rule', 'Support', 'Confidence'])

In [29]:
# Using min_supp of 0.009 and min_conf of 0.75.

get_association_rules = get_association_rules(sequences, 0.009, 0.75)
print("Number of sequential rules acquired " + str(len(get_association_rules)))
get_association_rules

Number of sequential rules acquired 517


Unnamed: 0,Left_rule,Right_rule,Support,Confidence
0,"[Chittorgarh_Rajasthan, Belgaum_Karnataka]",[Sagar_Karnataka],0.001098,1.0
1,"[Chittorgarh_Rajasthan, Belgaum_Karnataka, Cha...",[Sagar_Karnataka],0.001098,1.0
2,"[Chittorgarh_Rajasthan, Chatrapur_Odisha]",[Sagar_Karnataka],0.001098,1.0
3,"[Chittorgarh_Rajasthan, Belgaum_Karnataka]","[Sagar_Karnataka, Suri_West Bengal]",0.001098,1.0
4,"[Chittorgarh_Rajasthan, Belgaum_Karnataka, Cha...","[Sagar_Karnataka, Suri_West Bengal]",0.001098,1.0
...,...,...,...,...
512,[Junnar_Maharashtra],"[Bulandshahr_Uttar Pradesh, Mehsana_Gujarat]",0.001098,1.0
513,[Burdwan_West Bengal],[Ranaghat_West Bengal],0.001098,1.0
514,[Ratlam_Madhya Pradesh],[Nimbahera_Rajasthan],0.001098,1.0
515,[Junnar_Maharashtra],[Mehsana_Gujarat],0.001098,1.0


The first rule, [Chittorgarh_Rajasthan, Belgaum_Karnataka] => [Sagar_Karnataka] with 0.0011 support and 1.0 confidence, implies that around 0.11% COVID-19 positive patients have travelled to the town Sagar in Karnataka state after travelling to the town Chittorgarh in Rajasthan state and then the town Belgaum in Karnataka state, and if a positive patient has travelled to the town Chittorgarh in Rajasthan state and then the town Belgaum in Karnataka state, the probability of them travel to the town Sagar in Karnataka state subsequently is 100%.

### Conclusion
The results of these analyses may assist the relevant decision-makers to identify potential expose sites once a COVID-19 positive patient is discovered in a relatively short timeframe and estimate the scale of the outbreak. Subsequent control measures, such as contact tracing, restrictions, and lockdowns, could be announced accordingly.