<a href="https://colab.research.google.com/github/JaydaBubel/IronHack_FinalProject/blob/main/Dealing_with_coordinates.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Data Cleaning**

### **Dealing with Coordinates:**

In order for tableau to read the geo data, it must be lat/long format. Therefore, as the data comes in the typical european format, it must first be converted in python. Some important reference points:

**European Terrestrial Reference System 1989 (ETRS89)** is a geodetic reference system used in Europe. It provides a fixed set of coordinates for mapping and navigation. ETRS89 is based on the World Geodetic System 1984 (WGS84) but is tailored specifically for Europe.

**World Geodetic System 1984 (WGS84)** is a global reference system used for GPS and mapping worldwide. It defines the shape of the Earth and coordinates based on the Earth's center of mass.

**Universal Transverse Mercator (UTM)** is a global map projection system that divides the Earth into zones and uses a flat grid to represent locations with easting and northing values. UTM is commonly used for accurate local and regional mapping.

Now, the difference from latitude and longitude (lat/long):

Latitude and longitude (lat/long) are a global coordinate system that uses angles to describe locations on the Earth's surface. Latitude measures north-south position, while longitude measures east-west position. ETRS89, UTM, and WGS84 provide alternative ways to represent locations using linear units (meters) and are better suited for measuring distances and performing precise geographic calculations. They are more suitable for engineering, cartography, and navigation, while **lat/long is commonly used for general location descriptions.**

In [28]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [29]:
import pyproj
import pandas as pd
!pip install utm



I first tried with pip/utm, but had no luck. I then tried pyproj and was able to transformed the coordinates.

In [30]:
import pandas as pd
from pyproj import Proj, transform
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore")

# Define the input and output file paths
input_file = '/content/drive/MyDrive/Ironhack/Final Project/231010 Sportkataster rr.xlsx'
output_file = '/content/drive/MyDrive/Ironhack/Final Project/Copy of 231010 Sportkataster rr.xlsx'

# Load the data from the Excel file
df = pd.read_excel(input_file)

# Function to convert UTM to latitude and longitude
def utm_to_lat_lon(row):
    # Check if the values are already numeric (float or int)
    if isinstance(row['X-Koordinate'], (float, int)) and isinstance(row['Y-Koordinate'], (float, int)):
        easting = row['X-Koordinate']
        northing = row['Y-Koordinate']
    else:
        # Convert comma-separated string to float
        easting = float(str(row['X-Koordinate']).replace(',', '.'))
        northing = float(str(row['Y-Koordinate']).replace(',', '.'))

    # Define the UTM zone (in your case, Zone 33)
    utm_zone = 33

    # Define the UTM projection and WGS84 projection
    utm = Proj(proj='utm', zone=utm_zone, ellps='WGS84')
    wgs84 = Proj(init='epsg:4326')

    # Perform the coordinate transformation
    lon, lat = transform(utm, wgs84, easting, northing)

    return pd.Series([lat, lon])

# Apply the conversion function and create new 'lat' and 'long' columns
df[['lat', 'long']] = df.apply(utm_to_lat_lon, axis=1)

# Save the updated data to a new Excel file
df.to_excel(output_file, index=False)

It worked on the first sheet of the file, as a test. Now with new file, which combines both sheets of covered/uncovered sporting facilities:

In [31]:
import pandas as pd
from pyproj import Proj, transform
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore")

# Define the input and output file paths
input_file = '/content/drive/MyDrive/Ironhack/Final Project/14102923_Sport_Gedeckt und ungedeckt_SZ.xlsx'
output_file = '/content/drive/MyDrive/Ironhack/Final Project/Copy of 14102923_Sport_Gedeckt und ungedeckt_SZ.xlsx'

# Load the data from the Excel file
df = pd.read_excel(input_file)

# Function to convert UTM to latitude and longitude
def utm_to_lat_lon(row):
    # Check if the values are already numeric (float or int)
    if isinstance(row['X-Koordinate'], (float, int)) and isinstance(row['Y-Koordinate'], (float, int)):
        easting = row['X-Koordinate']
        northing = row['Y-Koordinate']
    else:
        # Convert comma-separated string to float
        easting = float(str(row['X-Koordinate']).replace(',', '.'))
        northing = float(str(row['Y-Koordinate']).replace(',', '.'))

    # Define the UTM zone (in your case, Zone 33)
    utm_zone = 33

    # Define the UTM projection and WGS84 projection
    utm = Proj(proj='utm', zone=utm_zone, ellps='WGS84')
    wgs84 = Proj(init='epsg:4326')

    # Perform the coordinate transformation
    lon, lat = transform(utm, wgs84, easting, northing)

    return pd.Series([lat, lon])

# Apply the conversion function and create new 'lat' and 'long' columns
df[['lat', 'long']] = df.apply(utm_to_lat_lon, axis=1)

# Save the updated data to a new Excel file
df.to_excel(output_file, index=False)