Skip to content

In this project, I predict Indoor Location of users using Wifi fingerprints with a combination of Principal Component Analysis (PCA) and Multi-label Classification using skmultilearn

Notifications You must be signed in to change notification settings

Katba-Caroline/Predicting_Indoor_Location_Using_WiFi_Fingerprinting

Repository files navigation

Predicting_Indoor_Location_Using_WiFi_Fingerprinting

Table of Contents

  1. Libraries
  2. Project Description
  3. Data
  4. Approaches in the Literature
  5. Literature on Indoor Localization
  6. My Approach: Multi-label Classification
  7. Literature on Multi-Label Classification
  8. Notebook Table of Contents
  9. Results
  10. Future Improvements

Libraries

  • Numpy
  • Pandas
  • Seaborn
  • Matplotlib
  • Scipy
  • Sklearn
  • Skmultilearn

Project Description

In this project, I predict Indoor Location of users using Wifi fingerprints with a combination of Principal Component Analysis (PCA) and Multi-label Classification using skmultilearn Many businesses and service providers rely on localization services in order to better serve their patrons. Thanks to the inclusion of GPS sensors in mobile devices, Outdoor localization problems have been solved in a variety of ways and very accurately. However, indoor localization is still an open problem mainly due to the loss of GPS signal in indoor environments. Therefore, the problem of indoor localization has recently garnered increased attention from researchers who have opted to focus more on cheaper software solutions in place of expensive hardware solutions.

Indoor localization has many use cases and exhibits great potential for solving problems in:

  • Indoor navigation for humans and robots
  • Targeted advertising
  • Emergency response
  • Assisted living

Data

This data set is still unfortunately one of a kind and was recently presented by Joaquín Torres-Sospedra, Raúl Montoliu, Adolfo Martínez-Usó, Tomar J. Arnau, Joan P. Avariento, Mauri Benedito-Bordonau, Joaquín Huerta UJIIndoorLoc: A New Multi-building and Multi-floor Database for WLAN Fingerprint-based Indoor Localization Problems In Proceedings of the Fifth International Conference on Indoor Positioning and Indoor Navigation, 2014.

The UJIIndoorLoc database covers three buildings of Universitat Jaume I with 4 or more floors and almost 110.000 m2. It was created in 2013 by means of more than 20 different users and 25 Android devices. The database consists of 19,937 training/reference records (trainingData.csv file) and 1111 validation/test records (validationData.csv file).

The 529 attributes contain the WiFi fingerprint, the coordinates where it was taken, and other useful information. Each WiFi fingerprint can be characterized by the detected Wireless Access Points (WAPs) and the corresponding Received Signal Strength Intensity (RSSI). The intensity values are represented as negative integer values ranging -104dBm (extremely poor signal) to 0dbM. The positive value 100 is used to denote when a WAP was not detected. During the database creation, 520 different WAPs were detected. Thus, the WiFi fingerprint is composed by 520 intensity values.

A visual mapping of the Longitude and Latitude data to get an image of campus alt text

A visual mapping of the users who collected the data in each building alt text

Scatter Matrix of the Attributes alt text

Histogram of the Attributes alt text

Approaches in the Literature

Although available Data for indoor localization has unfortunately been scant, many have used this data to solve several problems in a variety of ways. Those approaches include the following:

  • Location identification using regression techniques
  • Floor positioning using classification
  • Building recognition using deep learning models
  • Trajectory tracking using a combination of the above methods
  • read more here

Literature on Indoor Localization

Here are several of the papers available on the topic that helped me in my research:

My Approach: Multi-label Classification

To my knowledge, my approach is a unique approach that has not been applied to the problem before. I treat the problem as a classification problem, but with a twist that can save time, effort and precious memory. I treat this problem as a Multi-Label Classification problem, wherein my model simultaneously predicts Building ID and Floor ID for a given input. Using a combination of Principal Component Analysis (PCA) and Multi-Label K Nearest Neighbor (MLKNN) algorithms, the model is able to predict the Building ID AND Floor ID simultaneously with 98.7% accuracy score and 0.003 Hamming loss. This model can also be expanded to include Space ID as well.

The "Difference between multi-class classification & multi-label classification is that in multi-class problems the classes are mutually exclusive, whereas for multi-label problems each label represents a different classification task, but the tasks are somehow related." read more here.

Although many of the previosuly mentioned approaches yielded excellent results, many of them relied on predicting only one independent variable at a time regardless of the technique. For example, predicting only the building ID or the floor ID independently. In my opinion creating separate models for such a prediction task can be quite costly in terms of compute power, memory and time savings. Especially when using Neural Networks which can require great computational powers. Such losses, especially in time can be even deadly. Imagine a fire on the 4th floor of a particular building in a university. Having a model that can accurately and quickly predict how many people are in that exact area can be tremendously helpful to emergency response personnel.

Literature on Multi-Label Classification

Notebook Table of Contents

  • Exploratory Data Analysis (EDA)
  • Preprocessing
  • Model Applications
  • Model Predictions
  • Model Predictions for Validation Data
    • I tested the model on the validation data set, a completely new and unseen data set to test the model.

Results

  • Using a combination of Principal Component Analysis (PCA) and Multi-Label K Nearest Neighbor (MLKNN) algorithms, the model is able to predict the Building ID AND Floor ID simultaneously on the Validation data with 81% accuracy score.
  • All of the predictions have been translated and saved to an external CSV to reflect the proper Building ID and Floor ID.

Future Improvements

  • This is a very unique data set and has become a harbinger of what's to come. Yet, sadly, it is one of a kind, thereby severly limiting our ability to innovate any further.
  • Additionally, the majority of Space IDs(classroom information) in the Validation dataset were nulls, this severly hindered the ability to include Space ID in the model as well.
  • I hope that there will be clearer documentation and more lucid examples of skmultilearn (a very powerful toolkit) in the near future.

About

In this project, I predict Indoor Location of users using Wifi fingerprints with a combination of Principal Component Analysis (PCA) and Multi-label Classification using skmultilearn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published