# Predict Waste Production for its Reduction

## Context

According to the World Bank, in 2016 cities generated 2.01 billion tons of solid waste. Per
person, this is around 0.74 kg/day! With the rapid growth of cities, this number is only
expected to increase. As cities are growing, it is urgent that optimization processes for
waste processing and more targeted public education on waste management and
separation. Finally, it is also important to note that waste collection also has an impact on air pollution.

## Goal & Outcome

The goal of this challenge is to help identify trends in waste production and help to create
insights into how to reduce waste and optimize its collection. The expected outcome to this challenge is to identify waste trends and to produce an
explainable model for predicting future waste production.
Finally, don’t forget to propose the application (product) for the model and study its
impact.

## Data

Austin Resource Recovery daily report providing waste collection information based on the following categories:
- Report Date: The date collections information was recorded.
- Load Type: The specific type of load that is being collected on that day.
- Load Time: Date & Time of Loading
- Load Weight: The weight (in pounds) collected for each service on the day it was delivered to a diversion facility
- Drop off Site: The location where each type of waste is delivered for disposal, recycling or reuse: TDS Landfill indicates the Texas Disposal System landfill located at 12200 Carl Rd, Creedmoor, TX 78610; Balcones Recycling is a recycling facility located at 9301 Johnny Morris Road Austin, TX 78724; MRF is a Materials Recycling Facility (such as Texas Disposal Systems or Balcones Recycling); Hornsby Bend is located at 2210 FM 973, Austin, TX 78725 and accepts food scraps, yard trimmings, food-soiled paper and other materials collected by ARR, and combined with other waste to produce nutrient-rich dillo dirt, used for landscaping.
- Route Type: The general category of collection service provided by Austin Resource Recovery
- Route Number: Austin Resource Recovery route that the truck that collected this load was following. Each route has abbreviated letters indicating the service type (e.g. Bulk = "BU") and a number indicating the specific route.

This information is used to help ARR reach its goals to transform waste into resources while keeping our community clean. For more information, visit www.austintexas.gov/department/austin-resource-recovery

# Development

In [11]:
import pandas as pd
import math
import plotly.express as px
import json
import fiona
import geopandas as gpd
import requests
import numpy as np
import matplotlib.pyplot as plt
import osmnx as ox
import osmium
from shapely.geometry import shape 
from datetime import datetime

!apt install libspatialindex-dev
!pip install osmnx
!pip install osmium
!pip install contextily
!pip install osm-runner

In [12]:
pd.set_option('float_format', '{:f}'.format)

In [14]:
pd.set_option('display.max_columns', None)  

In [27]:
data = pd.read_csv("data/waste_data.csv")

In [16]:
data.head()

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID
0,12/08/2020,BULK,12/08/2020 03:02:00 PM,5220.0,TDS LANDFILL,BULK,BU13,899097
1,12/08/2020,RECYCLING - SINGLE STREAM,12/08/2020 10:00:00 AM,11140.0,TDS - MRF,RECYCLING - SINGLE STREAM,RTAU53,899078
2,12/03/2020,RECYCLING - SINGLE STREAM,12/03/2020 10:34:00 AM,10060.0,BALCONES RECYCLING,RECYCLING - SINGLE STREAM,RHBU10,899082
3,12/07/2020,SWEEPING,12/07/2020 10:15:00 AM,7100.0,TDS LANDFILL,SWEEPER DUMPSITES,DSS04,899030
4,12/07/2020,RECYCLING - SINGLE STREAM,12/07/2020 04:00:00 PM,12000.0,TDS - MRF,RECYCLING - SINGLE STREAM,RMAU53,899048


In [17]:
data.tail()

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID
740868,04/09/2008,RECYCLING - PAPER,07/11/2021 07:00:39 AM,1080.0,MRF,RECYCLING,RW05,273708
740869,12/01/2015,BULK,07/11/2021 07:05:29 AM,9360.0,TDS LANDFILL,STORM,HAFLDBU15,676651
740870,04/25/2007,YARD TRIMMING,07/11/2021 07:01:56 AM,,HORNSBY BEND,YARD TRIMMINGS,YW04,224646
740871,04/09/2008,RECYCLING - COMINGLE,07/11/2021 07:00:39 AM,3960.0,MRF,RECYCLING,RW04,273706
740872,04/08/2008,RECYCLING - COMINGLE,07/11/2021 07:00:39 AM,5280.0,MRF,RECYCLING,RT24,273694


In [18]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 740873 entries, 0 to 740872
Data columns (total 8 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   Report Date   740873 non-null  object 
 1   Load Type     740873 non-null  object 
 2   Load Time     740873 non-null  object 
 3   Load Weight   668538 non-null  float64
 4   Dropoff Site  740873 non-null  object 
 5   Route Type    740873 non-null  object 
 6   Route Number  740873 non-null  object 
 7   Load ID       740873 non-null  int64  
dtypes: float64(1), int64(1), object(6)
memory usage: 45.2+ MB


In [19]:
data.describe()

Unnamed: 0,Load Weight,Load ID
count,668538.0,740873.0
mean,11763.477576,521353.123651
std,7554.855662,249972.621259
min,-4480.0,101223.0
25%,5740.0,289609.0
50%,11020.0,554862.0
75%,16520.0,741648.0
max,1562821.0,929006.0


In [28]:
len(data[np.isnan(data["Load Weight"])])

72335

In [30]:
#removing NaN valued from Load Weight
data = data[np.isnan(data["Load Weight"]) == False]

In [31]:
data["Report Date"] = pd.to_datetime(data["Report Date"])
data["Load Time"] = pd.to_datetime(data["Load Time"])

In [32]:
data.head(100)

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID
0,2020-12-08,BULK,2020-12-08 15:02:00,5220.000000,TDS LANDFILL,BULK,BU13,899097
1,2020-12-08,RECYCLING - SINGLE STREAM,2020-12-08 10:00:00,11140.000000,TDS - MRF,RECYCLING - SINGLE STREAM,RTAU53,899078
2,2020-12-03,RECYCLING - SINGLE STREAM,2020-12-03 10:34:00,10060.000000,BALCONES RECYCLING,RECYCLING - SINGLE STREAM,RHBU10,899082
3,2020-12-07,SWEEPING,2020-12-07 10:15:00,7100.000000,TDS LANDFILL,SWEEPER DUMPSITES,DSS04,899030
4,2020-12-07,RECYCLING - SINGLE STREAM,2020-12-07 16:00:00,12000.000000,TDS - MRF,RECYCLING - SINGLE STREAM,RMAU53,899048
...,...,...,...,...,...,...,...,...
96,2020-12-09,BRUSH,2020-12-09 11:27:00,8200.000000,HORNSBY BEND,BRUSH,BR24,899245
97,2020-12-08,ORGANICS,2020-12-08 13:53:00,11660.000000,ORGANICS BY GOSH,YARD TRIMMINGS-ORGANICS,OBT99,899223
98,2020-12-08,ORGANICS,2020-12-08 14:53:00,12840.000000,ORGANICS BY GOSH,YARD TRIMMINGS-ORGANICS,OT10,899254
99,2020-11-28,RECYCLING - SINGLE STREAM,2020-11-28 11:10:00,12210.000000,BALCONES RECYCLING,RECYCLING - SINGLE STREAM,RFAS41,899192


In [35]:
# Here we can see two typos which we can correct for
# The year was inputted wrong while the date and month were inputed right 
data = data[pd.DatetimeIndex(data["Load Time"]).year <= 2021] 

In [None]:
# Correct two rows
# data.iloc[[354250], 2] = data.iloc[[354250], 2].replace(year=2021)
# data.iloc[[730958], 2] = data.iloc[[730958], 2].replace(year=2021)

In [36]:
data[pd.DatetimeIndex(data["Load Time"]).year > 2021] 

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID


In [37]:
data.dtypes

Report Date     datetime64[ns]
Load Type               object
Load Time       datetime64[ns]
Load Weight            float64
Dropoff Site            object
Route Type              object
Route Number            object
Load ID                  int64
dtype: object

In [38]:
data.sort_values(by = "Load Time", ascending = False)

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID
717844,2020-12-21,RECYCLING - SINGLE STREAM,2021-12-21 12:41:00,6940.000000,TDS LANDFILL,RECYCLING - SINGLE STREAM,RMAU21,906125
739696,2020-11-24,ORGANICS,2021-12-07 00:00:00,1340.000000,ORGANICS BY GOSH,YARD TRIMMINGS-ORGANICS,OBT99,927983
740735,2021-06-28,MIXED LITTER,2021-07-11 07:07:45,3140.000000,TDS LANDFILL,KAB,KAB02,927260
740751,2021-06-30,GARBAGE COLLECTIONS,2021-07-11 07:07:42,17200.000000,TDS LANDFILL,GARBAGE COLLECTION,PW30,928229
740721,2020-09-23,GARBAGE COLLECTIONS,2021-07-11 07:07:30,0.000000,TDS LANDFILL,GARBAGE COLLECTION,PAW70,889455
...,...,...,...,...,...,...,...,...
107125,2012-10-16,BULK,2001-10-16 15:28:00,8260.000000,TDS LANDFILL,BULK,BU16,545996
322083,2012-10-16,BULK,2001-10-16 11:51:00,14080.000000,TDS LANDFILL,BULK,BU16,545997
175739,2012-03-16,BULK,2001-03-16 13:33:00,4740.000000,TDS LANDFILL,BULK,BU05,522334
550853,2012-03-16,BULK,2001-03-16 09:38:00,4240.000000,TDS LANDFILL,BULK,BU05,522335


In [39]:
data["Load Type"].unique()

array(['BULK', 'RECYCLING - SINGLE STREAM', 'SWEEPING',
       'GARBAGE COLLECTIONS', 'YARD TRIMMING', 'BRUSH', 'ORGANICS',
       'MIXED LITTER', 'RECYCLED METAL', 'TIRES', 'DEAD ANIMAL', 'LITTER',
       'RECYCLING - COMINGLE', 'RECYCLING - PAPER', 'BAGGED LITTER',
       'MULCH', 'MATTRESS', 'RECYCLING - PLASTIC BAGS',
       'CONTAMINATED RECYCLING', 'CONTAMINATED YARD TRIMMINGS',
       'YARD TRIMMING - X-MAS TREES', 'CONTAMINATED ORGANICS'],
      dtype=object)

In [40]:
data["Load Type"].value_counts()

GARBAGE COLLECTIONS            258395
RECYCLING - SINGLE STREAM      147612
YARD TRIMMING                   69554
BULK                            40117
BRUSH                           39141
RECYCLING - PAPER               32155
RECYCLING - COMINGLE            31116
ORGANICS                        17705
SWEEPING                        16522
DEAD ANIMAL                      6854
TIRES                            3205
MIXED LITTER                     2110
LITTER                           1539
MULCH                            1344
RECYCLED METAL                   1049
BAGGED LITTER                      43
RECYCLING - PLASTIC BAGS           40
YARD TRIMMING - X-MAS TREES        16
MATTRESS                            9
CONTAMINATED RECYCLING              8
CONTAMINATED YARD TRIMMINGS         1
CONTAMINATED ORGANICS               1
Name: Load Type, dtype: int64

## By Load Type

In [None]:
weightsum_by_type = data[["Load Weight", "Load Type"]].groupby(by = "Load Type").sum()
weightmean_by_type = data[["Load Weight", "Load Type"]].groupby(by = "Load Type").mean()

In [None]:
weightsum_by_type.sort_values(by= "Load Weight", ascending= False)

In [None]:
weightmean_by_type.sort_values(by= "Load Weight", ascending= False)

https://routereadytrucks.com/blogs/know-4-major-types-garbage-trucks/

Front Loader Garbage Trucks
You will require massive containers to collect all the garbage from industrial and commercial properties. That is when front loader garbage trucks will help you with their size. Their containers, often called dumpsters, are spacious enough to collect industrial waste materials. From, slime and sludge to waste from factories, the design of these trucks make it possible to accommodate all types of garbage inside. They come with steel forks controlled hydraulically. An operator lift picks up the waste materials and dumps them into the container.

Most front loaders available in the US can lift containers weighing approximately 8000 lbs. On the other hand, they can hold trash of up to 40 cubic yards.

Side Loader Garbage Trucks
If industrial waste is not your cup of tea, you can focus on removal of household waste. For this, a side loader garbage truck will be most suitable. You need to load the waste materials from the side. There are two variants available in this truck: one with automatic robotic arms that will collect the garbage and second, manually. The automated side loaders are slightly more expensive. They require only one operator. You can collect rubbish from almost 1500 homes every day.

The size of the side loader garbage truck plays a crucial role in deciding the quantity of the waste materials it can carry. Most of the standard trucks can hold approximately 30,000 lbs of compacted garbage every day and hold up to 28 cubic yards of garbage. Some of these trucks are available at a budget-friendly price if you buy them second-hand. The manual side loaders will cost lesser compared to the automated side loaders.

Rear Loader Garbage Trucks
If you want to serve both commercial and residential clients, then get a rear loader garbage truck. These are the most versatile when it comes to trash collection. Their significant opening at the back allows you to collect massive quantities of waste in one go. Many residential clients keep their garbage inside bin bags. No matter what their size is, you can collect plenty of them inside the truck in one day. Like carrying, these trucks also help in dumping the contents too, thanks to their substantial rear opening.

Most rear loader garbage trucks can accommodate trash from as many as 800 to 850 homes. Some of the bigger variants can haul up to 18 tons of garbage. Their weight capacity ranges from 6 to 35 cubic yards depending on their size. You can purchase one of these used beasts for a very affordable price. But make sure you check the condition of the truck before buying.

Roll Off Trucks
These are the most popular garbage trucks when it comes to mass-scale commercial trash removal services. You can see them in demolition and construction sites. Their sturdy construction makes them a perfect fit for handling heavier materials, such as cardboard and steel. These trucks have massive roll off containers that you can drop at specified locations and then pick them up after a period after the clients have loaded them with waste materials.

These help to pick up the loaded container without much effort. A roll off truck can carry approximately 20,000 lbs, which is equal to 10 tons. Its sturdy construction makes sure the truck doesn’t get damaged during the pickup and drop off process.

Most people don’t value the engineering genius of garbage trucks. If you want to flourish in your garbage removal business, make sure you choose one of these trucks for higher efficiency because of the quantity of the trash they can carry.

## By Dropoff Sites

In [None]:
weightsum_by_dropoff = data[["Load Weight", "Dropoff Site"]].groupby(by = "Dropoff Site").sum()
weightmean_by_dropoff = data[["Load Weight", "Dropoff Site"]].groupby(by = "Dropoff Site").mean()

In [None]:
weightmean_by_dropoff.sort_values(by= "Load Weight", ascending= False)

In [None]:
weightsum_by_dropoff.sort_values(by= "Load Weight", ascending= False)

## By Route Number

### Basic Explo

In [47]:
# Checking for 2021 in particular since route types likely to change over the years
data2021 = data[pd.DatetimeIndex(data["Load Time"]).year == 2021] 
data2021.head()

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID
713883,2021-01-02,ORGANICS,2021-01-02 10:26:00,22120.0,ORGANICS BY GOSH,YARD TRIMMINGS-ORGANICS,OF12,902139
713884,2021-01-04,GARBAGE COLLECTIONS,2021-01-04 19:15:00,3640.0,TDS LANDFILL,GARBAGE COLLECTION,PAM70,902199
713887,2021-01-04,BRUSH,2021-01-04 15:13:00,3700.0,HORNSBY BEND,BRUSH,BR21,902172
713889,2021-01-04,YARD TRIMMING,2021-01-04 13:12:00,11060.0,HORNSBY BEND,YARD TRIMMINGS,YM04,902194
713892,2021-01-04,YARD TRIMMING,2021-01-04 16:55:00,15980.0,HORNSBY BEND,YARD TRIMMINGS,YM03,902182


In [49]:
data2021.groupby(["Route Number"]).sum()

Unnamed: 0_level_0,Load Weight,Load ID
Route Number,Unnamed: 1_level_1,Unnamed: 2_level_1
0BM00,45840.000000,12742458
0F16,24380.000000,2731198
AFD-FIREWISE,117300.000000,20007163
BR01,338320.000000,53819749
BR02,199160.000000,25301116
...,...,...
YW02,79380.000000,5430867
YW03,40600.000000,4527166
YW04,71740.000000,6336319
YW05,49640.000000,4526143


In [55]:
data2021.groupby(["Route Number"]).mean()

Unnamed: 0_level_0,Load Weight,Load ID
Route Number,Unnamed: 1_level_1,Unnamed: 2_level_1
0BM00,3274.285714,910175.571429
0F16,8126.666667,910399.333333
AFD-FIREWISE,5331.818182,909416.500000
BR01,5734.237288,912199.135593
BR02,7112.857143,903611.285714
...,...,...
YW02,13230.000000,905144.500000
YW03,8120.000000,905433.200000
YW04,10248.571429,905188.428571
YW05,9928.000000,905228.600000


In [68]:
data2021["Route Number"].value_counts()

DSS04      318
VB-01      285
WS21       154
OCPBU23    145
BU28       123
          ... 
PAM67        1
RH05         1
RH02         1
RF03         1
NR04         1
Name: Route Number, Length: 841, dtype: int64

In [69]:
data[data["Route Number"] == "VB-01" ].head()

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID
157,2020-12-08,MIXED LITTER,2020-12-08 13:49:00,6820.0,TDS LANDFILL,LITTER CONTROL,VB-01,899227
604,2020-12-11,MIXED LITTER,2020-12-11 13:44:00,4960.0,TDS LANDFILL,LITTER CONTROL,VB-01,899626
617,2020-12-11,MIXED LITTER,2020-12-11 13:52:00,7000.0,TDS LANDFILL,LITTER CONTROL,VB-01,899627
1550,2020-12-16,MIXED LITTER,2020-12-16 13:04:00,7160.0,TDS LANDFILL,LITTER CONTROL,VB-01,900590
1551,2020-12-17,MIXED LITTER,2020-12-17 07:32:00,5980.0,TDS LANDFILL,LITTER CONTROL,VB-01,900591


In [58]:
data2021["Route Count"] = data2021[["Route Type","Route Number"]].groupby(["Route Number"]).count()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data2021["Route Count"] = data2021[["Route Type","Route Number"]].groupby(["Route Number"]).count()


In [59]:
data2021

Unnamed: 0,Report Date,Load Type,Load Time,Load Weight,Dropoff Site,Route Type,Route Number,Load ID,Route Count
713883,2021-01-02,ORGANICS,2021-01-02 10:26:00,22120.000000,ORGANICS BY GOSH,YARD TRIMMINGS-ORGANICS,OF12,902139,
713884,2021-01-04,GARBAGE COLLECTIONS,2021-01-04 19:15:00,3640.000000,TDS LANDFILL,GARBAGE COLLECTION,PAM70,902199,
713887,2021-01-04,BRUSH,2021-01-04 15:13:00,3700.000000,HORNSBY BEND,BRUSH,BR21,902172,
713889,2021-01-04,YARD TRIMMING,2021-01-04 13:12:00,11060.000000,HORNSBY BEND,YARD TRIMMINGS,YM04,902194,
713892,2021-01-04,YARD TRIMMING,2021-01-04 16:55:00,15980.000000,HORNSBY BEND,YARD TRIMMINGS,YM03,902182,
...,...,...,...,...,...,...,...,...,...
740867,2008-03-31,RECYCLING - COMINGLE,2021-07-11 07:00:38,2580.000000,MRF,RECYCLING,RM01,273140,
740868,2008-04-09,RECYCLING - PAPER,2021-07-11 07:00:39,1080.000000,MRF,RECYCLING,RW05,273708,
740869,2015-12-01,BULK,2021-07-11 07:05:29,9360.000000,TDS LANDFILL,STORM,HAFLDBU15,676651,
740871,2008-04-09,RECYCLING - COMINGLE,2021-07-11 07:00:39,3960.000000,MRF,RECYCLING,RW04,273706,


### Overall Sum & Mean Per Route

In [None]:
weightsum_by_rn = data[["Load Weight", "Route Number"]].groupby(by = "Route Number").sum()
weightmean_by_rn = data[["Load Weight", "Route Number"]].groupby(by = "Route Number").mean()

In [None]:
weightmean_by_rn.sort_values(by= "Route Number", ascending= True)

In [None]:
weightsum_by_rn.sort_values(by= "Route Number", ascending= True)

## By Time

In [None]:
list_datetime_objs = ["Year", "Month", "Day", "Hour"]

In [None]:
data["Year"] = pd.DatetimeIndex(data["Load Time"]).year
data["Month"] = pd.DatetimeIndex(data["Load Time"]).month
data["Day"] = data["Load Time"].dt.dayofweek
data["Hour"] = pd.DatetimeIndex(data["Load Time"]).hour

In [None]:
data.dtypes

### Year

In [None]:
weightsum_by_year = data[["Load Weight", "Year"]].groupby(by = "Year").sum()
weightmean_by_year = data[["Load Weight", "Year"]].groupby(by = "Year").mean()

In [None]:
weightsum_by_year.sort_values(by= "Year", ascending= False)

In [None]:
fig = px.bar(weightsum_by_year, x= weightsum_by_year.index, y="Load Weight", title='Load Weight by Year')
fig.show()

### Month

In [None]:
weightsum_by_month = data[["Load Weight", "Month"]].groupby(by = "Month").sum()
weightmean_by_month = data[["Load Weight", "Month"]].groupby(by = "Month").mean()

In [None]:
weightsum_by_month.sort_values(by= "Month", ascending= False)

In [None]:
fig = px.bar(weightsum_by_month, x= weightsum_by_month.index, y="Load Weight", title='Load Weight by Month')
fig.show()

### Day

In [None]:
weightsum_by_day = data[["Load Weight", "Day"]].groupby(by = "Day").sum()
weightmean_by_day = data[["Load Weight", "Day"]].groupby(by = "Day").mean()

Monday being 0 and Sunday being 6

In [None]:
weightsum_by_day.sort_values(by= "Day", ascending= False)

In [None]:
fig = px.bar(weightsum_by_day, x= weightsum_by_day.index, y="Load Weight", title='Load Weight by Day')
fig.show()

### Hour

In [None]:
weightsum_by_hour = data[["Load Weight", "Hour"]].groupby(by = "Hour").sum()
weightmean_by_hour = data[["Load Weight", "Hour"]].groupby(by = "Hour").mean()

In [None]:
weightsum_by_hour.sort_values(by= "Hour", ascending= False)

In [None]:
fig = px.bar(weightsum_by_hour, x= weightsum_by_hour.index, y="Load Weight", title='Load Weight by Hour')
fig.show()

## Open Street Maps

OSM dataset contains waste management locations listed in OpenStreetMap (OSM). Specifically, it includes OSM features having the tags:
- "amenity:recycling" 
- "amenity:waste_basket 
- "amenity:waste_transfer_station"
- "amenity:sanitary_dump_station"
- "amenity:waste_disposal"
- "industrial:scrap_yard"
- "landuse:landfill"
- "man_made:wastewater_plant"
- "water:wastewater"

It includes a poi_type, a poi_name, and all other OSM tags as associated with the point (see https://taginfo.openstreetmap.org/tags).

Check taginfo https://taginfo.openstreetmap.org/

### Tag Info 

In [None]:
key = "amenity"
url = "https://taginfo.openstreetmap.org/api/4/key/values"

response = requests.get(url, params={
                        'key' : key,
                        'page' : 0, 'rp':100,
                        'sortname':'count', 'sortorder':'desc'
})

data = response.json()['data']
df = pd.DataFrame(data).set_index('value')
df[['count','description']].head()

In [None]:
%%bash
wget    https://download.bbbike.org/osm/extract/planet_-98.323,29.94_-97.185,30.569.osm.pbf \
    --quiet -O data/Austin.osm.pbf

In [None]:
!ogrinfo data/Austin.osm.pbf

In [None]:
%%bash
ogr2ogr \
  -f "GPKG" data/austin_points.gpkg \
      data/Austin.osm.pbf \

In [None]:
gdf_points = gpd.read_file("data/austin_points.gpkg", driver='GPKG')
gdf_points.head(2)

In [None]:
gdf_points.loc[0,"other_tags"]

In [None]:
gdf_points["geometry"]

In [None]:
%%bash
ogr2ogr \
  -f "GPKG" data/austin_poly.gpkg \
      data/Austin.osm.pbf \
    -nlt POLYGONS \
    -nln polygons

https://gis.stackexchange.com/questions/277231/geopandas-valueerror-a-linearring-must-have-at-least-3-coordinate-tuples

In [None]:
#Read data
layer_file = "data/austin_poly.gpkg"
collection = list(fiona.open(layer_file,'r'))
df1 = pd.DataFrame(collection)

#Check Geometry
def isvalid(geom):
    try:
        shape(geom)
        return 1
    except:
        return 0
df1['isvalid'] = df1['geometry'].apply(lambda x: isvalid(x))
df1 = df1[df1['isvalid'] == 1]
collection = json.loads(df1.to_json(orient='records'))

#Convert to geodataframe
gdf = gpd.GeoDataFrame.from_features(collection)

In [None]:
gdf

In [None]:
gdf.to_csv("data/austin_poly.csv")

In [None]:
gdf.columns.nunique()

In [None]:
gdf.columns.unique

OSM dataset contains waste management locations listed in OpenStreetMap (OSM). Specifically, it includes OSM features having the tags "amenity:recycling", "amenity:waste_transfer_station", "amenity:sanitary_dump_station", "amenity:waste_disposal", or "industrial:scrap_yard"

In [None]:
gdf_poly = pd.read_csv("data/austin_poly.csv")

In [None]:
gdf_poly=gdf_poly.replace('NaN', np.nan)

In [None]:
gdf_poly[gdf_poly["amenity"] != np.nan]

In [None]:
gdf_poly[(gdf_poly["amenity"] != None)]

In [None]:
gdf_poly[(gdf_poly["amenity"] == "recycling") | (gdf_poly["amenity"] == "recycling") | (gdf_poly["amenity"] == "waste_transfer_station") | (gdf_poly["amenity"] == "sanitary_dump_station") ]