# Predicting the religion distribution of Australian suburb towns

Bhagya Warnakulasooriya

August 22,2020

## Introduction

Politics is a blend of the government, political parties and the politicians of a country. In Australia, the political discourse has an apperent influence on the life of people, including immigration, cost of housing, the freedom and rights of an individual and many more.  

As stated by Winston Churchill, "Politics is not a game, but a serious business". Political parties have both long term and short term plans, when preparing for their propaganda. They consider various factors such as average income per person, religion distribution, employment distribution, natural resources distribution,  in each area for an insightful planning. They take the advantage of each factor when appointing candidates for an area, organizing meetings, workshops, other volunteer work etc. 

My project focuses on predicting the religion distribution of the suburb towns in Australia by categorizing them into clusters considering the most common religious venues in each town which would help the future campaigns of a political party. 

## Data

Information on all Australian towns were downloaded as a csv file from https://www.australiantownslist.com. This includes geographical data (latitude and longitude coordinates), name, state, postal code, type(whether it is urban or a suburb), population, median income, area etc. Only the suburbs were considered. Popular Foursquare API was used to grab the religious venues around a 2km radius of each suburb town. 

## Methodology

Data cleaning was done at first. The towns with zero population was removed from the dataframe. Since only the suburbs were considered, the towns which falls under suburbs were filtered out of the dataframe which led to a total of 127. All the suburb towns were fallen under eight states, namely, Western Australia, New South Wales, Victoria, South Australia, Queensland, Australian Capital Territory, Tasmania and Northern Territory.    

The latitude and longitude coordinate of each town was readily available in the dataframe and the basic need was to grab only the name of the town and its corresponding coordinates. But I could not opt out the state which each town belongs to, because there were towns which had the same name, but falls under different states. Although these types of towns have different geographical coordinates, for later coding purposes, continuing with both the name and the state was convenient.

Folium was used to map the suburb towns. While itereting through the dataframe, the GET requests were made to Foursqaure in order to get only the religious venue categories around each suburb town. The religious venue categories of each town was fed into a another dataframe along with their venue name, venue latitude and venue longitude. 

<img src="files/imgs/venue_category_table.png">

Surprisingly, this dataframe contained venue categories which are not religious venues. This is because when we are considering the latitude and longitude of a specific point, foursquare may capture many venue categories.
For example a building could have a church at the ground floor while having a restaurant in an upper floor. So I had to filter only the religious venues from the foursquare json results for each town.

One hot encoding was used. The number of venues which falls under a specific venue category for each town was counted. 

<img src="files/imgs/one_hot_encoding.png">

Popular machine learning technique, K-Means Clustering was used to cluster the suburb towns based on their religious venue distribution. Here the optimal value for K was found using the elbow method and it turned out to be 3. 

<img src="files/imgs/elbow_method.png"> 

Each cluster was further analysed considering the number of religious venue categories belonging to the cluster.

## Results

The below dataframe depicts the cluster labels of each suburb town. The first, second, third, fourth and fifth most common venues for each town were also included in this dataframe to be more informative.

<img src="files/imgs/cluster_distribution.png">

For instance the cluster distribution of suburb towns around Melbourne was as follows:

<img src="files/imgs/Melbourne.png">

A pictorial representation of the religious venue categories in each cluster could be more effective in understanding the religion distribution in them.

<img src="files/imgs/venue_distribution_cluster_0.png">
<img src="files/imgs/venue_distribution_cluster_1.png">
<img src="files/imgs/venue_distribution_cluster_2.png">

The highlighting factor is that every cluster contains around 75% churches. Considering the remaining 25%, this 25% also contains venue categories that does not predict any specific religion. Namely, 'Temples' and 'Spiritual Centers'. 
By the way, the majority of every cluster is possesed by the christians/catholics. So the distribution of each cluster excluding the churches would account to a more clear picture about the minor religion disrtibution.

<img src="files/imgs/cluster0_churches_excluded.png">

When examine cluster 0, Mosques are of large amount regardless of the spiritual centers and temples. Thus, the majority here is islams. As Hindu Temples and shrines accomponies with Hindu religion, second comes the Hindus. Then comes Buddhism, Sikh religion and the Jewish respectively.

<img src="files/imgs/cluster1_churches_excluded.png">

According to the above pie chart, it is easy to figure out that majority is covered by Mosques reperesenting the Islams. Pretty much Jewish are here compared to cluster 0 which become the second most religion of this cluster. Third comes the Hindus together with 7% of venues(Hindu Temples and Shrines) for their religion. The least contribution is by Buddhists.

<img src="files/imgs/cluster2_churches_excluded.png">

Exploring the last cluster rather come up with the same islamic majority. Apart from Islams, the other religions that counts for this cluster are only Buddism and jewish.


## Discussion

When classifying objects into clusters, the objects within the cluster should share similar characteristics while the objects in different clusters being solely distinguishable. Although the clusters here seems to be similar with about 75% of churches and around 25% of other religious venues, the difference among the clusters is based on the 25% of venues which contributes to the minor religions. 

When studying the minor representation, Islam becomes the top most common religion in all three clusters. In cluster 0, there is a diverse of religions including Hindu, Buddism, Sikh religion and Jewish. Cluster 1 consists of 
the Jewish, Hindu and Buddhism with considerable aomunt of Jewish. In contrast, cluster 2 does not contain Hindus and has equal amounts of Jewish and Buddhists and is the least diverse cluster.

It is fair enough to assume that religious destribution is strongly correlated with the amount of religious venues for each religion in a town. For example, if there are more mosques, then more islams live there.

In a nutshell, regardless of the majorities; Christian/catholic and Islam, towns of cluster 0 has a variety of religions and cluter 1 with more Jews and Hindus while cluster 2 with Jews and Buddists. 

## Conclusion

In this study, the suburb towns of Australia were categorized into three clusters depending on the religious venue distribution of each town using 'K-Means Clustering'. The venue distribution of each cluster was then considered and used to predict the religion distribution of the three clusters. Thus, a political party could have an idea about the religion distribution of each town by figuring out, which cluster this town belongs to.

I reccomend that a political party would consider each cluster with Christian/catholic as the top most religion and the Islam being the second while the minority being multi-religions in cluster 0, more Jews and Hindus in cluster 1 and Jews and Buddists in cluster 2. By the way, a political party could organize similar campaigns in towns which belongs to the same cluster and consider the religion distribution of each town to make important dicisions in their future endeavors.