<a href="https://colab.research.google.com/github/BNIA/VitalSigns/blob/main/RBIntel_Create.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome

# Housing -> RBIntel -> Data Intake and Operations

> This notebook uses data to generate a portion of BNIA's Vital Signs report.

This colab and more can be found at https://github.com/BNIA/colabs.


## Whats Inside?: 

#### __Indicators Used__

- ✅ 30 - __dom__ - (RBIntel) Median Number of Days on the Market
- ✅ 38 - __cashsa__ - (RBIntel) Percentage of residential sales for cash
- ✅ 39 - __reosa__ - (RBIntel) percentage of residential sales in foreclosure (REO)

#### __Datasets Used__

- ✔️ housing.rbintelregion_201X __(30-dom, 38-cashsa, 39-reosa -> DaysOnMark, newtrust1l, foreclosur)__



#### __Operations Performed__

- Reading in data (points/ geoms)
-- Convert lat/lng columns to point coordinates
-- Geocoding address to coordinates
-- Changing coordinate reference systems
- Basic Operations
- Saving shape data
- Get Polygon Centroids
- Working with Points and Polygons
-- Map Points and Polygons
-- Get Points in Polygons
-- Create Choropleths
-- Create Heatmaps (KDE?)

## SETUP Enviornment:

### Import Modules

In [None]:
%%capture
! pip install -U -q PyDrive
! pip install geopy
! pip install geopandas
! pip install geoplot
! pip install dataplay
! pip install matplotlib
! pip install psycopg2-binary

In [None]:
%%capture
! apt-get install build-dep python-psycopg2
! apt-get install libpq-dev
! apt-get install libspatialindex-dev

In [None]:
%%capture
!pip install rtree
!pip install dexplot

In [None]:
from dataplay.geoms import workWithGeometryData

In [None]:
%%capture 
# These imports will handle everything
import os
import sys
import csv
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame
import psycopg2
import pyproj
from pyproj import Proj, transform
# conda install -c conda-forge proj4
from shapely.geometry import Point
from shapely import wkb
from shapely.wkt import loads
# https://pypi.org/project/geopy/
from geopy.geocoders import Nominatim

# In case file is KML, enable support
import fiona
fiona.drvsupport.supported_drivers['kml'] = 'rw'
fiona.drvsupport.supported_drivers['KML'] = 'rw'

In [None]:
from IPython.display import clear_output
clear_output(wait=True)

In [None]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

import matplotlib.pyplot as plt

### Configure Enviornment

In [None]:
# This will just beautify the output

pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# pd.set_option('display.expand_frame_repr', False)
# pd.set_option('display.precision', 2)
# pd.reset_option('max_colwidth')
pd.set_option('max_colwidth', 20)
# pd.reset_option('max_colwidth')

## Prep Datasets

#### TPOP CSA and Baltimore

Get Baltimore

In [None]:
#collapse_output
#collapse_input
csa = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Tpop/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
csa = gpd.read_file(csa);
csa.head(1)

Get CSA

In [None]:
url2 = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Tpop/FeatureServer/1/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
csa2 = gpd.read_file(url2);
csa2['CSA2010'] = csa2['City_1'] 
csa2['OBJECTID'] = 56 
csa2 = csa2.drop(columns=['City_1'])
csa2.head()

Append do no append Bcity. We put it on the Bottom of the df because when performing the ponp it returns only the last matching columns CSA Label. 

In [None]:
# csa = pd.concat([csa2, csa], ignore_index=True)
csa = csa.append(csa2).reset_index(drop=True)

In [None]:
csa.head(3)

In [None]:
csa.tail(3)

In [None]:
csa.head()

In [None]:
csa.drop(columns=['Shape__Area', 'Shape__Length', 'OBJECTID'], axis=1).to_file("BCity_and_CSA.geojson", driver='GeoJSON')

### Mdprop -  [totalres](https://bniajfi.org/indicators/Housing%20And%20Community%20Development/totalres)

https://dev.bniajfi.org/indicators/Housing%20And%20Community%20Development/ownroc/2018

Baltimore City - 54.6

In [None]:
# total residential properties -> [totalres](https://bniajfi.org/indicators/Housing%20And%20Community%20Development/totalres)

totalres = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Totalres/FeatureServer/0/query?where=1%3D1&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&resultType=none&distance=0.0&units=esriSRUnit_Meter&returnGeodetic=false&outFields=totalres18%2C+CSA2010&returnGeometry=true&returnCentroid=false&featureEncoding=esriDefault&multipatchOption=xyFootprint&maxAllowableOffset=&geometryPrecision=&outSR=&datumTransformation=&applyVCSProjection=false&returnIdsOnly=false&returnUniqueIdsOnly=false&returnCountOnly=false&returnExtentOnly=false&returnQueryGeometry=false&returnDistinctValues=false&cacheHint=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&having=&resultOffset=&resultRecordCount=&returnZ=false&returnM=false&returnExceededLimitFeatures=true&quantizationParameters=&sqlFormat=none&f=pgeojson&token="

totalres = gpd.read_file(totalres); # Has ACS 17 Queries, including tpop17 (we want tpop10).
totalres.head()

## Points In Polygon

In [None]:
def retrieveAndcleanRbIntel(filename, year):
  rbintel = gpd.read_file(filename);
  print(len(rbintel));
  # Convert to EPSG:4326
  rbintel = rbintel.to_crs(epsg=4326)
  rbintel.crs

  rbintel['x'] = rbintel.geometry.x
  rbintel['y'] = rbintel.geometry.y

  # Reference: All Points
  base = csa.plot(color='white', edgecolor='black')
  rbintel.plot(ax=base, marker='o', color='green', markersize=5);

  # Get CSA Labels for all Points.
  rbintelCSA = workWithGeometryData( 
      method='ponp', df=rbintel, polys=csa, ptsCoordCol='geometry', 
      polygonsCoordCol='geometry', polyColorCol=False, polygonsLabel='CSA2010'
  )
  rbintelCSA = rbintelCSA.drop('geometry',axis=1)
  rbintelCSA.to_csv('ponp_rbintel_'+year+'.csv', index=False)
  return rbintelCSA

In [None]:
rbintel17 = retrieveAndcleanRbIntel("RBIntelRegion_2017.shp", '17');

In [None]:
rbintel18 = retrieveAndcleanRbIntel("RBIntelRegion_2018.shp", '18');

In [None]:
rbintel19 = retrieveAndcleanRbIntel("RBIntel_2019_BaltRegion.shp", '19');

In [None]:
rbintel20 = retrieveAndcleanRbIntel("RBIntel_2020_BaltRegion.shp", '20');

Region17 and Region 18 should have a similar number of records

## Preliminary Analysis W/ PonP.

In [None]:
import pandas as pd
import geopandas
import matplotlib.pyplot as plt

r17 = pd.read_csv("ponp_rbintel_17.csv");
r18 = pd.read_csv("ponp_rbintel_18.csv");
r19 = pd.read_csv("ponp_rbintel_19.csv");
r17 = geopandas.GeoDataFrame( r17, geometry=geopandas.points_from_xy(r17.x, r17.y)) 
r18 = geopandas.GeoDataFrame( r18, geometry=geopandas.points_from_xy(r18.x, r18.y)) 
r19 = geopandas.GeoDataFrame( r19, geometry=geopandas.points_from_xy(r19.x, r19.y)) 
r19.columns

In [None]:
cd ../

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
# https://stackoverflow.com/questions/41308763/python-pandas-df-duplicated-and-df-drop-duplicated-not-finding-all-duplicates
def exploreDs(df, yr):

  def createIndicatorAndPlotChoropleth(ddf, txt1):
    fig, ax = plt.subplots(1, 1)
    csa.merge( vsDom(df, 'DOM_'+txt1+yr) , left_on='CSA2010', right_on='CSA2010' ).plot(column='dom', ax=ax, legend=True); plt.savefig('./output/img/DOM_Map_Of_the_'+txt1+yr+'.jpg')
    csa.merge( vsCashsa(df, 'Cashsa_'+txt1+yr) , left_on='CSA2010', right_on='CSA2010' ).plot(column='cashsa', ax=ax, legend=True); plt.savefig('./output/img/Cashsa_Map_Of_the_'+txt1+yr+'.jpg')
    csa.merge( vsReosa(df, 'Reosa_'+txt1+yr) , left_on='CSA2010', right_on='CSA2010' ).plot(column='reosa', ax=ax, legend=True); plt.savefig('./output/img/Reosa_Map_Of_the_'+txt1+yr+'.jpg')

  def plotAndSave(ddf, txt):
    fig, ax = plt.subplots(1, 1)
    base = csa.plot(color='white', edgecolor='black')
    ddf.plot(ax=base, marker='o', color='green', markersize=5);
    plt.savefig('./output/'+txt)

  print('!~~~~~~~~~~~~~~~~~~~~~!STARTING!!!!!! ',yr,' !~~~~~~~~~~~~~~~~~~~~~!')

  #
  # Drop All un-needed Columns
  df = df[['CSA2010', 'AddressLin', 'geometry', 'DaysOnMark', 'NewTrust1L', 'Foreclosur', 'SoldDate']]
  # Sort the Dataset by Address
  #
  df = df.sort_values(by=['AddressLin']).reset_index()
  print('Given: ', len(df), ' Records')
  # Run this Indicators
  createIndicatorAndPlotChoropleth(df, 'Untouched_Records')
  # Plot it on a CSA Map
  plotAndSave(df, 'Dot_Map_Of_the_Untouched_Records_'+yr+'.jpg') 

  #
  # Drop the NON CSA Records
  # Save a copy of the Dropped Records? 
  # - Nah. They wont effect our calculations and removing them adds clarity.
  #
  df.drop(df[df['CSA2010'] == 'false'].index, inplace=True)
  print('There are ', len(df), ' Records Remaining after Droping Non-CSA Records')
  # Run this Indicators
  createIndicatorAndPlotChoropleth(df, 'Dropped_Non_CSA_Records')
  # Plot it on a CSA Map
  plotAndSave(df, 'Dot_Map_Of_the_Dropped_Non_CSA_Records_'+yr+'.jpg') 

  #
  # Determines which duplicates (if any) to keep. 
  # - first : Drop duplicates except for the first occurrence. 
  # - last : Drop duplicates except for the last occurrence. 
  # - False : Drop all duplicates.
  # Filter the dataset for duplicates in the AddressLin.
  #
  val1 = df.drop_duplicates(subset=['SoldDate', 'AddressLin'], keep='last').reset_index()
  print('There are', len(val1) , ' Records Remaining after Droping all but the last Duplicate on SoldDate & AddressLin')
  # Run this Indicators
  createIndicatorAndPlotChoropleth(val1, 'Dropped_Non_CSA_Records_and_Deduped')
  # Plot it on a CSA Map
  plotAndSave(val1, 'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped_'+yr+'.jpg') 
  
  #
  # Save a copy of the data that was filtered out in a new dataset
  #
  val2 = df[df.duplicated(subset=['SoldDate', 'AddressLin'], keep=False)].reset_index()
  print('Having Removed This Many: ', len(val2))
  # Run this Indicators
  createIndicatorAndPlotChoropleth(val2, 'Dropped_Non_CSA_Records_and_Kept_Only_Duplicates')
  # Plot it on a CSA Map
  plotAndSave(val2, 'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates_'+yr+'.jpg') 

  return ( val1, val2, df )

r177, val217, val317 = exploreDs(r17, '17')
r188, val218, val318 = exploreDs(r18, '18')
r189, val219, val319 = exploreDs(r19, '19')

## VS Indicator Functions

In [None]:
# I want to see how many points are in each polygon.

# https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/dom/FeatureServer/layers
# https://bniajfi.org/indicators/Housing%20And%20Community%20Development/dom

def vsDom(df, yr):
  print( 'Unique Foreclosure Values', df.Foreclosur.unique() )
  print( 'Unique NewTrust1L  Values', df.NewTrust1L.unique() )

  dom = df[['DaysOnMark','CSA2010']].copy()
  dom['ponpcount'+yr] = 1
  domDenominator = dom.copy()
  dom = domDenominator.groupby('CSA2010').median(numeric_only=True) # use .median to calculate DOM.
  dom['ponpcount'+yr] = domDenominator.groupby('CSA2010').sum(numeric_only=True)['ponpcount'+yr] # use .median to calculate DOM.
  dom['dom'] = dom['DaysOnMark']
  dom = dom.reset_index()
  dom = dom[['dom', 'ponpcount'+yr, 'CSA2010']]
  # Next steps create Baltimore's record
  # Remove the 'False' Records
  reapp = dom.loc[len(dom)-1]
  dom = dom.drop([len(dom)-1])
  # Create Baltimore
  dom = dom.append({'CSA2010': 'Baltimore City' , 'ponpcount'+yr:  dom['ponpcount'+yr].sum()/55, 'dom' : dom['dom'].sum()/55 } , ignore_index=True)
  # Reappend the False records
  dom = dom.append(reapp)
  dom.to_csv('./output/'+yr+'.csv', index=False)
  return dom
"""
print('~~~~~~~~~~18~~~~~~~~~~~~~~~~~')
vsDom(r18, '18').tail()
print('~~~~~~~~~~~17~~~~~~~~~~~~~~~~')
vsDom(r17, '17').tail()"""

In [None]:
def inspectCashaDenominator(df):
  df.head()
inspectCashaDenominator(r18)  

In [None]:
# I want to see how many points are in each polygon.

# https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/dom/FeatureServer/layers
# https://bniajfi.org/indicators/Housing%20And%20Community%20Development/dom

def vsCashsa(df, yr):
  print( 'Unique Foreclosure Values', df.Foreclosur.unique() )
  print( 'Unique NewTrust1L  Values', df.NewTrust1L.unique() )

  # Denominator
  cashsa = df[['NewTrust1L','CSA2010']].copy()
  cashsa['ponpcount'+yr] = 1
  cashsaDenominator = cashsa.copy()
  # Sum using ALL in CSA
  cashsaDenominator = cashsa.groupby('CSA2010').sum(numeric_only=True) 
  # cashsa = cashsa.reset_index()
  # cashsa = cashsa[['cashsa', 'ponpcount', 'CSA2010']]

  # Filter to get applicable records for the Numerator
  cashsa = cashsa[ cashsa['NewTrust1L'].str.contains('.Cash.|.Cash|Cash.|Cash', regex=True, na=False) ]
  # Save the filtered list
  cashsa.to_csv('./output/'+'cashsa_Filtered_Records'+yr+'.csv', index=False)
  print("LENGTH: ", len(cashsa) )
  # Now Sum the filtered list by csa to get our Numerator value
  cashsa = cashsa.groupby('CSA2010').sum(numeric_only=True) 
  # Create the Indicator
  cashsa['cashsa'] = cashsa['ponpcount'+yr] * 100 / cashsaDenominator['ponpcount'+yr] 
  cashsa = cashsa.reset_index()
  cashsa = cashsa[ ['CSA2010', 'ponpcount'+yr, 'cashsa' ] ]
  # Create Baltimore's Record
  # Remove the 'False' Records
  reapp = cashsa.loc[len(cashsa)-1]
  cashsa = cashsa.drop([len(cashsa)-1])
  cashsa = cashsa.append({'CSA2010': 'Baltimore City' , 
                              'ponpcount'+yr:  cashsa['cashsa'].sum(), 
                              'cashsa' : cashsa['cashsa'].sum()/55 } , ignore_index=True)
  # Reappend the False records
  cashsa = cashsa.append(reapp)
  cashsa.to_csv('./output/'+yr+'.csv', index=False)
  return cashsa
"""
print('~~~~~~~~~~18~~~~~~~~~~~~~~~~~')
vsCashsa(r18, '18').head()
vsCashsa(r18, '18').tail()
print('~~~~~~~~~~~17~~~~~~~~~~~~~~~~')
vsCashsa(r17, '17').head()
vsCashsa(r17, '17').tail()"""

In [None]:
def inspectReosaDenominator(df, yr):
  print( 'Unique Foreclosure Values', df.Foreclosur.unique() )
  print( 'Unique NewTrust1L  Values', df.NewTrust1L.unique() )
  # Dedupe on 'AddressLin', 'SoldDate
  print( "Original Dataset's Length: ", len(df))
  temp = r18.drop_duplicates(subset=['AddressLin', 'SoldDate'], keep='last')
  print('Deduped Length: ', len(temp))
  print('Numer of Records Removed: ', len(df) - len(temp))
  # Drop any NA AddressLin
  temp = temp.dropna(subset=['AddressLin'])
  print('Num Removed With No NA Addresses: ', len(df) - len(temp))

  temp.head(1) # CSA2010 AddressLin SoldDate
  temp['count'] = 1
  v1= temp.groupby(by=["CSA2010","Foreclosur"]).sum()
  v2= temp.groupby(by=["CSA2010","DaysOnMark"]).median()
  v3= temp.groupby(by=["CSA2010","NewTrust1L"]).sum() # .sort_values(by=['col1', 'col2'])
  v1.to_csv('reosa_Deduped'+yr+'_CSAs_Unique_Foreclosure_Counts.csv', index=False)
  v2.to_csv('reosa_Deduped'+yr+'_CSAs_Unique_DOM_Counts.csv', index=False) 
  v3.to_csv('reosa_Deduped'+yr+'_CSAs_Unique_CASHSA_Counts.csv', index=False) 
  return temp

inspectReosaDenominator(r18, '18')  
# Compare DS's for each CSA where Points Exists but A ForeClosure Value Does not.

In [None]:
# https://bniajfi.org/indicators/Housing%20And%20Community%20Development/reosa/2017
def vsReosa(df, yr):
  print( 'Unique Foreclosure Values', df.Foreclosur.unique() )
  print( 'Unique NewTrust1L  Values', df.NewTrust1L.unique() )

  # Get Denominator
  reosa = df.copy()
  resosa = reosa
  reosa['reosaCount'+yr] = 1
  reosaDenominator = reosa.copy() 
  reosaDenominator = reosaDenominator.groupby('CSA2010').sum(numeric_only=True) 
  # Filter to get applicable records for the Numerator
  reosa = reosa[ reosa['Foreclosur'].str.contains('.Y.|.Y|Y.|Y', regex=True, na=False) ]
  # Save the filtered list
  reosa.to_csv('./output/'+'reosa_Filtered_Records.csv', index=False)
  print("LENGTH: ", len(reosa) )
  # Aggregate Numeric Values by Sum 
  # Now Sum the filtered list by csa to get our Numerator value
  reosa = reosa.groupby('CSA2010').sum(numeric_only=True) 

   
  # Create the Indicator
  reosa['reosa'] = reosa['reosaCount'+yr] * 100 / reosaDenominator['reosaCount'+yr] 
  reosa = reosa.reset_index()
  reosa = reosa[ ['CSA2010', 'reosaCount'+yr, 'reosa' ] ]
  # Create Baltimore's Record
  # Remove the 'False' Records 
  reapp = reosa.loc[len(reosa)-1]
  reosa = reosa.drop([len(reosa)-1])
  reosa = reosa.append({'CSA2010': 'Baltimore City' , 
                              'reosaCount'+yr:  reosa['reosa'].sum(), 
                              'reosa' : reosa['reosa'].sum()/(len(reosa)-1) } , ignore_index=True)
  # Reappend the False records
  reosa = reosa.append(reapp)
  reosa.to_csv('./output/'+yr+'.csv', index=False)
  return reosa
"""
print('~~~~~~~~~~18~~~~~~~~~~~~~~~~~')
vsReosa(r18, '18').head()
vsReosa(r18, '18').tail()
print('~~~~~~~~~~~17~~~~~~~~~~~~~~~~')
vsReosa(r17, '17').head()
vsReosa(r17, '17').tail()"""

In [None]:
ls

# Review

In [None]:
cd output

In [None]:
!pip install ipywidgets

In [None]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual 

In [None]:
ls

In [None]:
# imgs = os.listdir('./img')
imgs = ! ls img/18_DOM*.jpg
for ele in enumerate(imgs): 
    imgs[ ele[0] ] = ele[1][10:]
imgs

In [None]:
# Untouched_Records    Dropped_Non_CSA_Records    Dropped_Non_CSA_Records_and_Deduped    Dropped_Non_CSA_Records_and_Kept_Only_Duplicates

In [None]:
from google.colab import widgets
import numpy as np
import random
import time
from matplotlib import pylab

In [None]:
ls

In [None]:
#@title String fields

text = 'value' #@param {type:"string"}
ind = 'Reosa' #@param ['DOM', "Cashsa", "Reosa","Dot"]
inspect = 'Untouched_Records' #@param ['Dropped_Non_CSA_Records18', 'Dropped_Non_CSA_Records_and_Deduped', 'Dropped_Non_CSA_Records_and_Kept_Only_Duplicates', 'Untouched_Records'] {allow-input: true}

grid = widgets.Grid(4, 2)  
filename17 = ind+'_Map_Of_the_'+'Untouched_Records'+'17.jpg'
filename18 = ind+'_Map_Of_the_'+'Untouched_Records'+'18.jpg'
filename172 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records'+'17.jpg'
filename182 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records'+'18.jpg'
filename172 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records_and_Deduped'+'17.jpg'
filename182 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records_and_Deduped'+'18.jpg'
filename173 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records'+'17.jpg'
filename183 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records'+'18.jpg'
filename174 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records'+'17.jpg'
filename184 = ind+'_Map_Of_the_'+'Dropped_Non_CSA_Records'+'18.jpg'
with grid.output_to(0, 0):
  print(filename17)
  display(Image(filename17))
with grid.output_to(0, 1):
  print(filename18)
  display(Image(filename18))
with grid.output_to(1, 0):
  print(filename172)
  display(Image(filename172))
with grid.output_to(1, 1): 
  print(filename182)
  display(Image(filename182))
with grid.output_to(2, 0):
  print(filename173)
  display(Image(filename173))
with grid.output_to(2, 1): 
  print(filename183)
  display(Image(filename183))

In [None]:
ls

In [None]:
show_images([mpimg.imread('Cashsa_Map_Of_the_Dropped_Non_CSA_Records17.jpg'),
mpimg.imread('Cashsa_Map_Of_the_Dropped_Non_CSA_Records18.jpg'),
mpimg.imread('Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped17.jpg'),
mpimg.imread('Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped18.jpg'),
mpimg.imread('Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates17.jpg'),
mpimg.imread('Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates18.jpg'),
mpimg.imread('Cashsa_Map_Of_the_Untouched_Records17.jpg'),
mpimg.imread('Cashsa_Map_Of_the_Untouched_Records18.jpg')], 
cols = 4, saveimg='CashsaMapMatrix.jpg', 
titles = ['Cashsa_Map_Of_the_Dropped_Non_CSA_Records17.jpg',
'Cashsa_Map_Of_the_Dropped_Non_CSA_Records18.jpg',
'Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped17.jpg',
'Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped18.jpg',
'Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates17.jpg',
'Cashsa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates18.jpg',
'Cashsa_Map_Of_the_Untouched_Records17.jpg',
'Cashsa_Map_Of_the_Untouched_Records18.jpg'])

In [None]:
show_images([mpimg.imread('DOM_Map_Of_the_Dropped_Non_CSA_Records17.jpg'),
mpimg.imread('DOM_Map_Of_the_Dropped_Non_CSA_Records18.jpg'),
mpimg.imread('DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped17.jpg'),
mpimg.imread('DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped18.jpg'),
mpimg.imread('DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates17.jpg'),
mpimg.imread('DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates18.jpg'),
mpimg.imread('DOM_Map_Of_the_Untouched_Records17.jpg'),
mpimg.imread('DOM_Map_Of_the_Untouched_Records18.jpg')], 
cols = 4, saveimg='DOMMapMatrix.jpg', 
titles = ['DOM_Map_Of_the_Dropped_Non_CSA_Records17.jpg',
'DOM_Map_Of_the_Dropped_Non_CSA_Records18.jpg',
'DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped17.jpg',
'DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped18.jpg',
'DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates17.jpg',
'DOM_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates18.jpg',
'DOM_Map_Of_the_Untouched_Records17.jpg',
'DOM_Map_Of_the_Untouched_Records18.jpg'])

In [None]:
show_images([mpimg.imread('Reosa_Map_Of_the_Dropped_Non_CSA_Records17.jpg'),
mpimg.imread('Reosa_Map_Of_the_Dropped_Non_CSA_Records18.jpg'),
mpimg.imread('Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped17.jpg'),
mpimg.imread('Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped18.jpg'),
mpimg.imread('Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates17.jpg'),
mpimg.imread('Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates18.jpg'),
mpimg.imread('Reosa_Map_Of_the_Untouched_Records17.jpg'),
mpimg.imread('Reosa_Map_Of_the_Untouched_Records18.jpg')], 
cols = 4, saveimg='ReosaMapMatrix.jpg', 
titles = ['Reosa_Map_Of_the_Dropped_Non_CSA_Records17.jpg',
'Reosa_Map_Of_the_Dropped_Non_CSA_Records18.jpg',
'Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped17.jpg',
'Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped18.jpg',
'Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates17.jpg',
'Reosa_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates18.jpg',
'Reosa_Map_Of_the_Untouched_Records17.jpg',
'Reosa_Map_Of_the_Untouched_Records18.jpg'])

In [None]:
show_images([mpimg.imread('Dot_Map_Of_the_Dropped_Non_CSA_Records_17.jpg'),
mpimg.imread('Dot_Map_Of_the_Dropped_Non_CSA_Records_18.jpg'),
mpimg.imread('Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped_17.jpg'),
mpimg.imread('Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped_18.jpg'),
mpimg.imread('Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates_17.jpg'),
mpimg.imread('Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates_18.jpg'),
mpimg.imread('Dot_Map_Of_the_Untouched_Records_17.jpg'),
mpimg.imread('Dot_Map_Of_the_Untouched_Records_18.jpg')], 
cols = 4, saveimg='DotMapMatrix.jpg', 
titles = ['Dot_Map_Of_the_Dropped_Non_CSA_Records_17.jpg',
'Dot_Map_Of_the_Dropped_Non_CSA_Records_18.jpg',
'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped_17.jpg',
'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped_18.jpg',
'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates_17.jpg',
'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates_18.jpg',
'Dot_Map_Of_the_Untouched_Records_17.jpg',
'Dot_Map_Of_the_Untouched_Records_18.jpg'])

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def show_images(images, cols = 1, titles = None, saveimg='test.jpg'):
    """Display a list of images in a single figure with matplotlib.
    
    Parameters
    ---------
    images: List of np.arrays compatible with plt.imshow.
    
    cols (Default = 1): Number of columns in figure (number of rows is 
                        set to np.ceil(n_images/float(cols))).
    
    titles: List of titles corresponding to each image. Must have
            the same length as titles.
    """
    assert((titles is None)or (len(images) == len(titles)))
    n_images = len(images)
    if titles is None: titles = ['Image (%d)' % i for i in range(1,n_images + 1)]
    fig = plt.figure()
    for n, (image, title) in enumerate(zip(images, titles)):
        a = fig.add_subplot(cols, np.ceil(n_images/float(cols)), n + 1)
        plt.imshow(image)
        a.set_title(title)
    fig.set_size_inches(np.array(fig.get_size_inches()) * n_images)
    plt.show()
    fig.savefig(saveimg)

In [None]:
import matplotlib.image as mpimg 
f, axarr = plt.subplots(4,2)

f1 = 'Dot_Map_Of_the_Dropped_Non_CSA_Records_17.jpg'
f2 = 'Dot_Map_Of_the_Dropped_Non_CSA_Records_18.jpg'
f3 = 'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped_17.jpg'
f4 = 'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Deduped_18.jpg'
f5 = 'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates_17.jpg'
f6 = 'Dot_Map_Of_the_Dropped_Non_CSA_Records_and_Kept_Only_Duplicates_18.jpg'
f7 = 'Dot_Map_Of_the_Untouched_Records_17.jpg'
f8 = 'Dot_Map_Of_the_Untouched_Records_18.jpg'

axarr[0,0].imshow(('test',mpimg.imread( f1 ) ) )
axarr[1,0].imshow(mpimg.imread( f2 ) )
axarr[2,0].imshow(mpimg.imread( f3 ) )
axarr[3,0].imshow(mpimg.imread( f4 ) )
axarr[0,1].imshow(mpimg.imread( f5 ) )
axarr[1,1].imshow(mpimg.imread( f6 ) )
axarr[2,1].imshow(mpimg.imread( f7 ) )
axarr[3,1].imshow(mpimg.imread( f8 ) )
f.set_size_inches(18.5, 10.5, forward=True)
f.savefig('test.jpg')

In [None]:
grid = widgets.Grid(4, 2) 
with grid.output_to(0, 0):
  print(f1)
  display(Image(f1))
with grid.output_to(0, 1):
  print(f2)
  display(Image(f2))
with grid.output_to(1, 0):
  print(f3)
  display(Image(f3))
with grid.output_to(1, 1):
  print(f4)
  display(Image(f4))
with grid.output_to(2, 0):
  print(f5)
  display(Image(f5))
with grid.output_to(2, 1):
  print(f6)
  display(Image(f6))
with grid.output_to(3, 0):
  print(f7)
  display(Image(f7))
with grid.output_to(3, 1):
  print(f8)
  display(Image(f8))
grid.savefig('faces.png')

In [None]:
imgs

In [None]:
ls img/

In [None]:
df = pd.import_csv('_Map_Of_the_Untouched_Records');

In [None]:
ls