# Data Analysis and Visualization of ATM Locations in Saudi Arabia





### Asrar AlJuhani,  Nouf AlZahrani, and Zahra'a Hamwi

<img src="img/scat1.png" align="center">

# Data Visualization

### User interactive code to explorate the data
### Serveral charts...
### and Google Maps!

<img src="img/menu.png" align="center">

## Show the number of ATMs in a specified city
<img src="img/m1.png" align="center">

## The city that has the maximum number of ATMs
<img src="img/m2.png" align="center">

### The top cities with largest number of ATMs
 - Help banks to avoid cities that are full of ATMs, and expand in other cities
<img src="img/bar.png" width="600" height="600" align="center">

### The ATMs per reigon 
<img src="img/pie.png" width="600" height="600" align="center" >

## The distribution of ATMs in Saudi Arabia
<img src="img/scat2.png" align="center">

### The distribution of ATMs in Saudi Arabia
 - it is a clearer view of that scatter plot, and on a real map
<img src="img/heat.png" width="650" height="650" align="center">

### Mark ATMs on Google Maps
<img src="img/gmap.png" width=600 height=600 align="center">

In [1]:
class Menu:
    """ Offer data visualization by functions
    """
    
    def city_atms(self,city):
        """ Return number of ATMs in a specific city
        """
        
    def max_no_of_atms(self):
        """ Return the city with maximum number of ATMs
        """
    
    def vis_top_atms(self,no=7):
        """ Plot a bar chart for top cities with largest number of ATMs
        """
        
    def atms_scatter(self):
        """ Scatter plot to show the distribution of ATMs in Saudi Arabia
        """
    
    def reg_pie(self):
        """ Plot pie chart for ATMs per reigon
        """
    
    def atm_marker(self):
        """ Mark ATMs on Google Maps
        """
    
    def heat_map(self):
        """ Show heat map for the distribution of ATMs in Saudi Arabia
        """

In [2]:
menu=Menu()
def print_menu(): 
    print("\n\nMenu:",
          "\n1- Number of atms in a city",
          "\n2- The city with Max. number of ATMs",
          "\n3- Bar plot for top cities with largest number of ATMs",
          "\n4- Scatter plot for ATMs' distribution in Saudi Arabia",
          "\n5- Pie chart shows ATMs per reigon",
          "\nEnter Q to quit")
print_menu()

option=input('Enter a number from the above menu: ')
if option=='1':
    city_u=input('Enter a city name: ')
    menu.city_atms(city_u)


elif option=='2':
    print('\nmaximum number of atms is in:', menu.max_no_of_atms())

elif option=='3':
    menu.vis_top_atms()

elif option=='4':
    menu.atms_scatter()
    
elif option=='5':
    menu.reg_pie()
        
elif option=='Q':
    pass
else:
    print('Invalid option')



Menu: 
1- Number of atms in a city 
2- The city with Max. number of ATMs 
3- Bar plot for top cities with largest number of ATMs 
4- Scatter plot for ATMs' distribution in Saudi Arabia 
5- Pie chart shows ATMs per reigon 
Enter Q to quit


Enter a number from the above menu:  3


# Data Exploration:



In [3]:
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz,process
import gmaps
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
%matplotlib inline



In [4]:
#Import ATM dataset
atms = pd.read_excel('dataset.xlsx',sep='\t')

In [5]:
atms.head(3)

Unnamed: 0,Site,الموقع,City English,City Arabic,Reg,Brn,Site Type,X GIS Coordinates,Y GIS Coordinates
0,CA-AKIK BRANCH BAHA 2,فرع العقيق الباحة 2,AL-AQIQ- BAHA,العقيق - الباحة,AL-BAHA,Brn,Room-Window,41.6538,20.2704
1,CA-AKIK BRANCH BAHA 1,فرع العقيق الباحة 1,AL-AQIQ- BAHA,العقيق - الباحة,AL-BAHA,Brn,Room-Window,41.6538,20.2704
2,AKIK BRANCH BAHA 3,فرع العقيق الباحة 3,AL-AQIQ- BAHA,العقيق - الباحة,AL-BAHA,Brn,Room-Window,41.6538,20.2704


In [6]:
atms.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15779 entries, 0 to 15778
Data columns (total 9 columns):
Site                 11172 non-null object
الموقع               13419 non-null object
City English         11178 non-null object
City Arabic          7320 non-null object
 Reg                 8768 non-null object
Brn                  4898 non-null object
Site Type            8768 non-null object
X GIS Coordinates    15778 non-null object
Y GIS Coordinates    15778 non-null object
dtypes: object(9)
memory usage: 1.1+ MB


<img src="img/img2.png" align="center">

## Before start cleaning..

<img src="img/image.png" align="center">

- Remove spaces between words

In [7]:
atms.rename(columns=lambda x:x.replace(' ','_').lower(), inplace=True)

- Rename some columns

In [8]:
atms.rename(columns={'x_gis_coordinates':'lon','y_gis_coordinates':'lat','الموقع':'site_ar','_reg':'reg'},inplace=True);

# Data Cleaning

- Problems:
  - The data-set has so many nulls
  - The unique values are not unique
  - Coordinates out of Saudi Arabia

### Data Cleaning Plan
- Drop duplicated records
- Clean Cities and Region Columns
- Clean Coordinates columns

## 1. Drop duplicated records

In [9]:
atms.shape

(15779, 9)

In [10]:
#Drop duplicated records that are  identical in all colounm values
atms.drop_duplicates(keep='first',inplace=True)

In [11]:
atms.drop_duplicates(subset=['lon','lat'],keep='first',inplace=True);

In [12]:
atms.shape

(13068, 9)

## 2. Cleaning Cities:

   ### a.  Upper case all cities

In [13]:
# Upper case all cities
atms['city_english'] = atms['city_english'].str.upper()

In [14]:
# Checking the change in number of cities in city_english column
atms['city_english'].nunique()

576

   ### b.  Remove white spaces

In [15]:
atms['city_english']=atms['city_english'].str.strip();
atms['city_english'].nunique()

566

   ### c.  Uniform naming style: convert all (AL-) to (AL)

In [16]:
atms.replace(to_replace ='AL-', value = 'AL', regex = True,inplace=True) 
atms.replace(to_replace ='AL ', value = 'AL', regex = True,inplace=True)
atms['city_english'].nunique()

538

  ### d.  Check similarity using FuzzyWuzzy library

In [17]:
from fuzzywuzzy import fuzz,process

In [18]:
string1="ALKHUBAR"
string2="ALKHABRA"
fuzz.ratio(string1,string2)

75

In [20]:
def similar(column,score,limit=0):
    print(string1,"is similar to",string2);

<img src="img/game.jpg" align="center">

### e.  Manual replacement for incorrect spellings 

<img src="img/sim1.png" align="center">

<img src="img/sim2.png" align="center">

## 3. Cleaning Region Columns:

In [21]:
#Upper case all regions
atms['reg'] = atms['reg'].str.upper()

In [22]:
atms['reg'].unique()

array(['ALBAHA', 'ALJOUF', 'ASIR', 'EASTERN', 'HAIL', 'JAZAN', 'MADINA',
       'MAKKAH', 'NAJRAN', 'NORTH BORDER', 'QASSIM', 'RIYADH', 'TABOUK',
       nan, 'WESTERN', 'SOUTHERN', 'CENTRAL', 'NORTHERN'], dtype=object)

In [23]:
# Remove duplicated reigons 
atms.reg.replace(to_replace =['NORTHERN','CENTRAL','MAKKAH','MADINA','ALBAHA','JAZAN','NAJRAN'], 
             value = ['NORTH BORDER','RIYADH','WESTERN','WESTERN','SOUTHERN','SOUTHERN','SOUTHERN'], 
             regex = False,inplace=True) 

In [24]:
# Store all cities along with their reigons in a dictionary
grouping_by = atms[atms.reg.notnull()].groupby(['reg','city_english'])
city_reg={}

for (i,j) in enumerate(grouping_by):
    city_reg.update({j[0][1]: j[0][0]})

In [25]:
# Filling NaN reigons by the city from the city_reg dictionary
length = (atms.shape[0])
for i in range(length):
    if (isinstance(atms['reg'].iloc[i], str)):
        pass
    else:
        city= atms['city_english'].iloc[i]
        region= city_reg.get(city)
        atms['reg'].iloc[i]= region

## 4. Cleaning Coordinates columns

In [None]:
def check_coord(coordinate):
    """
    Return correct coordinates 
    """
    if isinstance(coordinate, float):
        if coordinate<90 and coordinate>-90:
            return coordinate
        else:
            return None
    else: 
        return None

In [None]:
# Replace improper coordinates format with the decimal coordinates 
atms[['lon','lat']] = atms[['lat','lon']].applymap(lambda coordinate: check_coord(coordinate))

# End of Data Cleaning!

## Future Plan:
   - Proceed with our cleaning plan until the data is totally clean
   - Link the site column to google map
   - Make a standard dictionary for All reigons, and cities and in both languages; Arabic and English, for Saudi Arabia
   - Mark nearest ATMs to user location on the map


In [None]:
print("Thanks for listening!")