Global_Education_Cluster_Analysis

Vector Support Model to analyze education data, obtained from the world bank, based on country.

The 'data.csv' file is too big to upload, but as an example of how the program works:

After compiling the program, in the Python REPL commandline enter

D = Dataset()

This will read in data from data.csv, definitions.csv, and codes.csv into a nested dictionary keyed by country then series code.

To represent these datapoints as vectors, enter:

A = Analysis(D)

From here you can enter A.KNN('3-letter country code') to find the "K nearest neighbors" or countries with the most similar education data.

For example

Input: A.KNN('ITA') #Italy

Output: [(0.22176569754073772, 'SMR', 'San Marino', 'Europe'), (0.32641356366071733, 'CZE', 'Czech Republic', 'Europe'), (0.35834790089150226, 'MKD', 'Macedonia, the former Yugoslav Republic of', 'Europe'), (0.3679861100113707, 'MNE', 'Montenegro', 'Europe'), (0.3884295503375777, 'PLW', 'Palau', 'Oceania')]

The output is a list of the 5 countries with the shortest euclidian distances between their vector representations

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Education_data_analysis.py		Education_data_analysis.py
README.md		README.md
codes.csv		codes.csv
defn.csv		defn.csv
proj1b.pdf		proj1b.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global_Education_Cluster_Analysis

About

Releases

Packages

Languages

aphoffmann/Global_Education_Cluster_Analysis

Folders and files

Latest commit

History

Repository files navigation

Global_Education_Cluster_Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages