<!-- PROJECT DESCRIPTION -->
# CSC 4260 Spring 2024 Term Project
This is the Google Colab notebook for the Spring 2024 Term Project for Advanced Data Science and Applications. The dataset for this project can be found here: [Google Smartphone Decimeter Challenge 2023](https://www.kaggle.com/competitions/smartphone-decimeter-2023)

<u>Google's Problem Description:</u>

“Precise smartphone positioning services enable many of the navigation features that we use today. Yet, current mobile phones only offer 3-5 meters of positioning accuracy. Lane-specific directions aren't always possible, which can lead to missed exits or inaccurate arrival times.  

Machine learning models could improve the accuracy of Global Navigation Satellite System (GNSS) data, enabling billions of Android users to have a more fine-tuned positioning experience.

Google's Precise Location Team, part of Android, hosted the Smartphone Decimeter Challenge in 2021 and 2022. This year, this competition is again dedicated to finding innovative research in smartphone GNSS positioning accuracy to enhance people's ability to navigate the world around them.  

Your work will help improve positioning accuracy to sub-meter level, or even centimeters. As a result, Android users could gain better lane-level navigation or carpool estimates during congestion. Beyond the car, better location data could enable augmented reality walking tours, precise agriculture via phones, and more” (Chow et al., 2023).

<u>Our Team's Problem Description:</u>

In a modern world dominated by smartphones, people have become heavily reliant on the navigation system and applications on their smart devices for daily navigation. However, current smartphones only provide limited positioning accuracy. Moreover, lane-specific directions are not always possible. This can lead to inaccurate arrival times or even missed exits in real-time driving scenarios. To tackle accuracy issues, machine learning models can be trained to improve the accuracy of the Global Navigation Satellite System (GNSS) data, allowing billions of people to have a smoother driving experience. Google has challenged our team to find innovative research in smartphone GNSS positioning accuracy to enhance people's ability to navigate the world around them. Our work will help improve the accuracy to sub-meter level or even centimeters. As a result, billions of people can gain better lane navigation or carpool estimates during heavy traffic hours. Beyond everyday usage within the car, better location data can find applications in various fields such as tourism, agriculture, and others.  

<!-- TEAM MEMBERs -->
## Team Members
<table style="width:100%">
    <thead>
        <tr>
            <th width="25%">Cade Kennedy</th>
            <th width="25%">Harrison Peloquin</th>
            <th width="25%">Kase Johnson</th>
            <th width="25%">Robert Bingham</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td width="25%">
                <a href="https://www.linkedin.com/in/cade-kennedy-107ab7249/">
                    <img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white"/>
                </a>
            </td>
             <td width="25%">
                <a href="https://www.linkedin.com/in/harrison-peloquin-2b080b24a/">
                    <img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white"/>
                </a>
            </td>
            <td width="25%">
                <!-- UPDATE TO YOUR LINKEDIN -->
                <a href="https://www.linkedin.com/in/kase-johnson-02a974205/">
                    <img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white"/>
                </a>
            </td>
            <td width="25%">
                <a href="https://www.linkedin.com/in/robert-bingham/">
                    <img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white"/>
                </a>
            </td>
        </tr>
    </tbody>
</table>

## Works Cited
Ashley Chow, Dave Orendorff, Michael Fu, Mohammed Khider, Sohier Dane, Vivek Gulati. (2023). Google Smartphone Decimeter Challenge 2023. Kaggle. https://kaggle.com/competitions/smartphone-decimeter-2023

In [1]:
# PUT ALL IMPORTS HERE
import glob
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import pyproj

Here we use a regex pattern to search through all the directories and glob all the files with `device_gnss.csv`. For each file found, we read the csv and append the data to a list. Once we have gone through all the files, we can concatenate all the data leaving us with the entire dataset.

In [2]:
filenames = glob.glob('./sdc2023/train/2023-*/*/device_gnss.csv')
df_train = []

for file in filenames:
  df_train.append(pd.read_csv(file, dtype={))

df_train = pd.concat(df_train, ignore_index=True)
df_train.head()

  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))
  df_train.append(pd.read_csv(file))


Unnamed: 0,MessageType,utcTimeMillis,TimeNanos,LeapSecond,TimeUncertaintyNanos,FullBiasNanos,BiasNanos,BiasUncertaintyNanos,DriftNanosPerSecond,DriftUncertaintyNanosPerSecond,...,SvVelocityYEcefMetersPerSecond,SvVelocityZEcefMetersPerSecond,SvClockBiasMeters,SvClockDriftMetersPerSecond,IsrbMeters,IonosphericDelayMeters,TroposphericDelayMeters,WlsPositionXEcefMeters,WlsPositionYEcefMeters,WlsPositionZEcefMeters
0,Raw,1693947542432,290185000000,18.0,,-1377982470247544629,0.826887,26.340462,3.145563,9.569438,...,-2100.20475,-1745.515708,-167082.196389,0.00277,0.0,5.824839,2.972454,-2678289.0,-4311251.0,3849763.0
1,Raw,1693947542432,290185000000,18.0,,-1377982470247544629,0.826887,26.340462,3.145563,9.569438,...,-1520.621173,2432.919573,-45220.428593,0.00552,0.0,12.384022,11.84536,-2678289.0,-4311251.0,3849763.0
2,Raw,1693947542432,290185000000,18.0,,-1377982470247544629,0.826887,26.340462,3.145563,9.569438,...,-1030.559521,-2694.455151,-15644.206004,-0.000696,0.0,8.264991,4.463468,-2678289.0,-4311251.0,3849763.0
3,Raw,1693947542432,290185000000,18.0,,-1377982470247544629,0.826887,26.340462,3.145563,9.569438,...,-1878.41226,-2020.986431,44812.530481,0.002188,0.0,5.538755,2.785802,-2678289.0,-4311251.0,3849763.0
4,Raw,1693947542432,290185000000,18.0,,-1377982470247544629,0.826887,26.340462,3.145563,9.569438,...,742.725535,3006.27637,-1782.227851,-0.001418,0.0,6.384073,3.153811,-2678289.0,-4311251.0,3849763.0


In [3]:
df_train.columns

Index(['MessageType', 'utcTimeMillis', 'TimeNanos', 'LeapSecond',
       'TimeUncertaintyNanos', 'FullBiasNanos', 'BiasNanos',
       'BiasUncertaintyNanos', 'DriftNanosPerSecond',
       'DriftUncertaintyNanosPerSecond', 'HardwareClockDiscontinuityCount',
       'Svid', 'TimeOffsetNanos', 'State', 'ReceivedSvTimeNanos',
       'ReceivedSvTimeUncertaintyNanos', 'Cn0DbHz',
       'PseudorangeRateMetersPerSecond',
       'PseudorangeRateUncertaintyMetersPerSecond',
       'AccumulatedDeltaRangeState', 'AccumulatedDeltaRangeMeters',
       'AccumulatedDeltaRangeUncertaintyMeters', 'CarrierFrequencyHz',
       'CarrierCycles', 'CarrierPhase', 'CarrierPhaseUncertainty',
       'MultipathIndicator', 'SnrInDb', 'ConstellationType', 'AgcDb',
       'BasebandCn0DbHz', 'FullInterSignalBiasNanos',
       'FullInterSignalBiasUncertaintyNanos', 'SatelliteInterSignalBiasNanos',
       'SatelliteInterSignalBiasUncertaintyNanos', 'CodeType',
       'ChipsetElapsedRealtimeNanos', 'ArrivalTimeNanosSince

In [1]:
df_train.describe()

NameError: name 'df_train' is not defined

In [None]:
df_train[['SvPositionXEcefMeters', 'SvPositionYEcefMeters', 'SvPositionZEcefMeters']].describe()

In [None]:
df_train.shape

In [None]:
df_train.info()

In [None]:
# check for how many nulls in dataset
df_train.isnull().sum()

In [None]:
# check how many zeroes in dataset
df_train.isin([0]).sum()

In [None]:
df_train.plot.scatter(x='SvPositionXEcefMeters', y='SvPositionYEcefMeters')

The data is in a format called [ECEF](https://en.wikipedia.org/wiki/Earth-centered,_Earth-fixed_coordinate_system). We need to convert the data to latitude, longitude, and altitude.  

In [None]:
transformer = pyproj.Transformer.from_crs(
    {"proj":'geocent', "ellps":'WGS84', "datum":'WGS84'},
    {"proj":'latlong', "ellps":'WGS84', "datum":'WGS84'},
    )

lon1, lat1, alt1 = transformer.transform(xx=df_train['SvPositionXEcefMeters'],
                                         yy=df_train['SvPositionYEcefMeters'],
                                         zz=df_train['SvPositionZEcefMeters'],
                                         radians=False)

fig = px.density_mapbox(lat=lat1, lon=lon1, z=alt1 radius=5,
                        center=dict(lat=0, lon=180), zoom=0,
                        mapbox_style="open-street-map")
fig.show()

In [None]:
p7pro_sample_trail_gnss = pd.read_csv("/content/smartphone-decimeter-2023/sdc2023/train/2023-03-08-21-34-us-ca-mtv-u/pixel7pro/device_gnss.csv")
p7pro_sample_trail_gt = pd.read_csv("/content/smartphone-decimeter-2023/sdc2023/train/2023-03-08-21-34-us-ca-mtv-u/pixel7pro/ground_truth.csv")

p5_sample_trail_gnss = pd.read_csv("/content/smartphone-decimeter-2023/sdc2023/train/2023-03-08-21-34-us-ca-mtv-u/pixel5/device_gnss.csv")
p5_sample_trail_gt = pd.read_csv("/content/smartphone-decimeter-2023/sdc2023/train/2023-03-08-21-34-us-ca-mtv-u/pixel5/ground_truth.csv")

p6pro_sample_trail_gnss = pd.read_csv("/content/smartphone-decimeter-2023/sdc2023/train/2023-03-08-21-34-us-ca-mtv-u/pixel6pro/device_gnss.csv")
p6pro_sample_trail_gt = pd.read_csv("/content/smartphone-decimeter-2023/sdc2023/train/2023-03-08-21-34-us-ca-mtv-u/pixel6pro/ground_truth.csv")

In [None]:
from datetime import datetime

def utc_to_human_readable(utcTime):
    utc_datetime_str = datetime.fromtimestamp(utcTime / 1e3)
    return datetime.strftime(utc_datetime_str, '%Y-%m-%d | %H:%M:%S')

print("Duration of input data (s):",
      (p7pro_sample_trail_gnss["utcTimeMillis"].max() - p7pro_sample_trail_gnss["utcTimeMillis"].min()) * 1e-3)
print("Starting from", utc_to_human_readable(p7pro_sample_trail_gnss["utcTimeMillis"].min()),
      "to", utc_to_human_readable(p7pro_sample_trail_gnss["utcTimeMillis"].max()))
labels = p7pro_sample_trail_gt[["LatitudeDegrees", "LongitudeDegrees", "UnixTimeMillis"]]
print("Duration of target data (s):",
      (labels["UnixTimeMillis"].max() - labels["UnixTimeMillis"].min()) * 1e-3)
print("Starting from", utc_to_human_readable(labels["UnixTimeMillis"].min()),
      "to", utc_to_human_readable(labels["UnixTimeMillis"].max()))

print("Duration of input data (s):",
      (p5_sample_trail_gnss["utcTimeMillis"].max() - p5_sample_trail_gnss["utcTimeMillis"].min()) * 1e-3)
print("Starting from", utc_to_human_readable(p5_sample_trail_gnss["utcTimeMillis"].min()),
      "to", utc_to_human_readable(p5_sample_trail_gnss["utcTimeMillis"].max()))
labels = p5_sample_trail_gt[["LatitudeDegrees", "LongitudeDegrees", "UnixTimeMillis"]]
print("Duration of target data (s):",
      (labels["UnixTimeMillis"].max() - labels["UnixTimeMillis"].min()) * 1e-3)
print("Starting from", utc_to_human_readable(labels["UnixTimeMillis"].min()),
      "to", utc_to_human_readable(labels["UnixTimeMillis"].max()))

print("Duration of input data (s):",
      (p6pro_sample_trail_gnss["utcTimeMillis"].max() - p6pro_sample_trail_gnss["utcTimeMillis"].min()) * 1e-3)
print("Starting from", utc_to_human_readable(p6pro_sample_trail_gnss["utcTimeMillis"].min()),
      "to", utc_to_human_readable(p6pro_sample_trail_gnss["utcTimeMillis"].max()))
labels = p6pro_sample_trail_gt[["LatitudeDegrees", "LongitudeDegrees", "UnixTimeMillis"]]
print("Duration of target data (s):",
      (labels["UnixTimeMillis"].max() - labels["UnixTimeMillis"].min()) * 1e-3)
print("Starting from", utc_to_human_readable(labels["UnixTimeMillis"].min()),
      "to", utc_to_human_readable(labels["UnixTimeMillis"].max()))


In [None]:
import seaborn as sns
sns.pairplot(p7pro_sample_trail_gnss[["SvClockDriftMetersPerSecond", "AccumulatedDeltaRangeMeters",
                                  "IonosphericDelayMeters", "TroposphericDelayMeters"]])



In [None]:
sns.pairplot(p5_sample_trail_gnss[["SvClockDriftMetersPerSecond", "AccumulatedDeltaRangeMeters",
                                  "IonosphericDelayMeters", "TroposphericDelayMeters"]])

In [None]:
sns.pairplot(p6pro_sample_trail_gnss[["SvClockDriftMetersPerSecond", "AccumulatedDeltaRangeMeters",
                                  "IonosphericDelayMeters", "TroposphericDelayMeters"]])

In [None]:
filenames_gnss = glob.glob('/content/smartphone-decimeter-2023/sdc2023/train/2023-*/*/device_gnss.csv')
filenames_gt = glob.glob('/content/smartphone-decimeter-2023/sdc2023/train/2023-*/*/ground_truth.csv')
df_train_gnss = []
df_train_gt = []

for file in filenames_gnss:
  df_train.append(pd.read_csv(file))

for file in filenames_gt:
  df_train.append(pd.read_csv(file))

df_train_gnss = pd.concat(df_train_gnss, ignore_index=True)
df_train_gt = pd.concat(df_train_gt, ignore_index=True)