# Network Science Project - Analysis of Sports Facilities in Estonia

## Purpose
This notebook presents the analysis conducted as part of a Network Science course project. The primary focus is on data manipulation and network analysis to derive meaningful insights from the dataset.

## Structure
- **Initial Setup**: Import necessary libraries and set up the environment for analysis.
- **Data Loading**: Load two datasets containing data about sports facilities and sports organizations in Estonia.
- **Data Preprocessing**: Convert columns to correct formats, process numerical data, and prepare the data for network analysis.

## Goals
- To understand the connections and relationships within the data.
- To apply network science methodologies to uncover patterns and insights.

## Data Sources
- JSON files from Eesti spordiregister (Estonian Sports Register)

## Preprocessing

### Sports Facilities dataset

In [1]:
import json
import pandas as pd

data = pd.read_json('spordiehitised.json')

# Parse the JSON-like string and extract information
def extract_info(json_str):
    try:
        if not isinstance(json_str, str):
            json_str = str(json_str)
        json_str = json_str.replace("'", '"').replace('\\', '\\\\')

        data_dict = json.loads(json_str)
        first_key = list(data_dict.keys())[0]
        info = data_dict[first_key]
        
        # Extract required fields
        objektid = info.get('objektid', '')
        objektnimi = info.get('objektnimi', '')
        tehnseisukord = info.get('tehnseisukord', '')
        
        # Extract 'spordialad', handling nested dictionary keys
        spordialad_keys = info.get('spordialad', {}).keys()
        spordialad = ', '.join([info['spordialad'][key].get('spordiala', '') for key in spordialad_keys])
        
        return pd.Series([objektid, objektnimi, tehnseisukord, spordialad])
    except Exception as e:
        # Log error details
        with open('error_log.txt', 'a') as f:
            f.write(f"Error processing JSON: {e} - Data: {json_str[:100]}\n")
        return pd.Series([None, None, None, None])

# Define new columns
new_columns = ['objektid', 'objektnimi', 'tehnseisukord', 'spordialad']
data[new_columns] = pd.DataFrame(columns=new_columns)  # Initialize the new columns

data[new_columns] = data['paigad'].apply(extract_info)

# Remove unnecessary columns
columns_to_keep = [
    'rajatisid', 'rajatisnimi', 'kompleks', 'liik', 'omandiliik', 'ehstaatus',
    'om_oigvorm_id', 'omaniknimi', 'omanikregkood', 'va_oigvorm_id',
    'valdajanimi', 'valdajaregkood', 'kaart_laius', 'kaart_pikkus', 'maakond',
    'maakond_kood', 'kov', 'asustusyksus', 'asustusyksus_kood', 'riietusruumidemahutavus',
    'objektid', 'objektnimi', 'tehnseisukord', 'spordialad'
]

data = data[columns_to_keep]
data.to_csv('spordiehitised.csv', index=False)

# Work with CSV file
processed_data = pd.read_csv('spordiehitised.csv')

# Convert all string columns to lowercase
processed_data = processed_data.apply(lambda x: x.str.lower() if x.dtype == "object" else x)
# Fill NaN values with '0'
processed_data = processed_data.fillna(0)

# Convert columns
for column in processed_data.columns:
    if processed_data[column].dtype == 'object':
        processed_data[column] = processed_data[column].astype(str)
columns_to_int = [
    'omanikregkood', 'valdajaregkood', 'maakond_kood', 'asustusyksus_kood', 'riietusruumidemahutavus',
    'objektid'
]
for column in columns_to_int:
    processed_data[column] = pd.to_numeric(processed_data[column], errors='coerce').fillna(0).astype(int)

### Sports Organisations dataset

In [2]:
data_org = pd.read_json('spordiorganisatsioonid.json')
columns_to_keep_org = [
    'org_id', 'nimi', 'registrikood', 'maakond', 'maakond_kood', 'kov', 'kov_kood', 'kaart_laius', 'kaart_pikkus'
]
data_org = data_org[columns_to_keep_org]

data_org.to_csv('spordiorganisatsioonid.csv')

# Work with CSV file
processed_data_org = pd.read_csv('spordiorganisatsioonid.csv')

# Convert all string columns to lowercase
processed_data_org = processed_data_org.apply(lambda x: x.str.lower() if x.dtype == "object" else x)
# Fill NaN values with '0'
processed_data_org = processed_data_org.fillna(0)

# Convert columns
for column in processed_data_org.columns:
    if processed_data_org[column].dtype == 'object':
        processed_data_org[column] = processed_data_org[column].astype(str)
columns_to_int_org = [
    'registrikood', 'maakond_kood', 'kov_kood'
]

for column in columns_to_int_org:
    processed_data_org[column] = pd.to_numeric(processed_data_org[column], errors='coerce').fillna(0).astype(int)

In [3]:
data_org = pd.read_json('spordiorganisatsioonid.json')

my_dict_dports = data_org['sport'].dropna()
not_numpy_dict = {key: val for key, val in my_dict_dports.items()} #'numpy.ndarray' object is not callable

values = [x for x in dict.values()] # {index: value} format before

a = values[10] # random value, can iterate for all

for key in a:
    print(a[key]['tegutsemispaigad']) #take tegutsemispaigad for this spordiala

TypeError: unbound method dict.values() needs an argument