From 8faec4567ac39d2394e47915da2abdd54ccea383 Mon Sep 17 00:00:00 2001 From: Fletcher Foti Date: Wed, 13 Aug 2014 11:55:04 -0700 Subject: [PATCH 1/5] dframe_explorer docs are now done --- docs/examples.rst | 42 +++++---------------- docs/maps/dframe_explorer.rst | 9 ----- docs/maps/index.rst | 63 +++++++++++++++++++++++++++++--- urbansim/maps/dframe_explorer.py | 37 +++++++++++++++++++ 4 files changed, 104 insertions(+), 47 deletions(-) delete mode 100644 docs/maps/dframe_explorer.rst diff --git a/docs/examples.rst b/docs/examples.rst index 110b749c..9eca7766 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -208,7 +208,7 @@ One thing to note is the `autoreload magic `_. +A sample estimation workflow is available `in the San Francisco example `_. This notebook estimates all of the models in the example that need estimation (because they are statistical models). In fact, every cell simply calls the `sim.run `_ method with one of the names of the model entry points defined in ``models.py``. The ``sim.run`` method resolves all of the dependencies and prints the output of the model estimation in the result cell of the IPython Notebook. Note that the hedonic models are estimated first, then simulated, and then the location choice models are estimated since the hedonic models are dependencies of the location choice models. In other words, the ``rsh_simulate`` method is configured to create the ``residential_sales_price`` column which is then a right hand side variable in the ``hlcm_estimate`` model (because residential price is theorized to impact the location choices of households). @@ -219,48 +219,24 @@ A sample simulation workflow (for a complete UrbanSim simulation is available `h This notebook is possibly even simpler than the estimation workflow as it has only one substantive cell which runs all of the available models in the appropriate sequence. Passing a range of years will run the simulation for multiple years (the example simply runs the simulation for a single year). Other parameters are available to the `sim.run `_ method which write the output to an HDF5 file. +.. _exploration-workflow: + Exploration Workflow ~~~~~~~~~~~~~~~~~~~~ UrbanSim now also provides a method to interactively explore UrbanSim inputs and outputs using web mapping tools, and the `exploration notebook `_ demonstrates how to set up and use this interactive display tool. -This is another simple and powerful notebook which can be used to quickly map variables of both base year and simulated data without leaving the workflow to use GIS tools. This example first creates the DataFrames for many of the UrbanSim tables that have been registered (``buildings``, ``househlds``, ``jobs``, and others). Once the DataFrames have been created, they are passed to the `dframe_explorer.start `_ method. - -The dframe_explorer takes a dictionary of DataFrames which are joined to a set of shapes for visualization. The most common case is to use a `geojson `_ format shapefile of zones to join to any DataFrame that has a zone_id (the dframe_explorer module does the join for you). Here the center and zoom level are set for the map, the name of geojson shapefile is passed, as are the join keys both in the geojson file and the DataFrames. - -Once that is accomplished, the cell can be executed and the IPython Notebook is now running a web service which will respond to queries from a web browser. Try is out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook. Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook. Your web browser should show a page like the following: - -.. image:: https://raw.githubusercontent.com/synthicity/urbansim/high-level-docs/docs/screenshots/dframe_explorer_screenshot.png - -Here is what each dropdown on the web page does: - -* The first dropdown gives the names of the DataFames you have passed ``dframe_explorer.start`` -* The second dropdown allows you to choose between each of the columns in the DataFrame with the name from the first dropdown -* The third dropdown selects the color scheme from the `colorbrewer `_ color schemes -* The fourth dropdown sets ``quantile`` and ``equal_interval`` `color schemes `_ -* The fifth dropdown selects the Pandas aggregation method to use -* The sixth dropdown executes the `.query `_ method on the Pandas DataFrame in order to filter the input data -* The seventh dropdown executes the `.eval `_ method on the Pandas DataFrame in order to create simple computed variables that are not already columns on the DataFrame. - -So what is this doing? The web service is translating the drop downs to a simple interactive Pandas statement, for example: :: - - df.groupby('zone_id')['residential_units'].sum() - -The notebook will print out each statement it executes. The website then transparently joins the output Pandas series to the shapes and create an interactive *slippy* web map using the `Leaflet `_ Javasript library. The code for this map is really `quite simple `_ - feel free to browse the code and add functionality as required. - -To be clear, the website is performing a Pandas aggregation on the fly. If you have a buildings DataFrame with millions of records, Pandas will ``groupby`` the ``zone_id`` and perform an aggregation of your choice. This is designed to give you a quickly navigable map interface to understand the underlying disaggregate data, similar to that supplied by commercial projects such as `Tableau `_. - -As a concrete example, note that the ``households`` table has a ``zone_id`` and is thus available for aggregation in ``dframe_explorer``. Since the web service is running aggregations on the *disaggregate* data, clicking to the ``households`` table and ``persons`` attribute and an aggregation of ``sum`` will run: :: +This is another simple and powerful notebook which can be used to quickly map variables of both base year and simulated data without leaving the workflow to use GIS tools. This example first creates the DataFrames for many of the UrbanSim tables that have been registered (``buildings``, ``househlds``, ``jobs``, and others). Once the DataFrames have been created, they are passed to the `start `_ method. - households.groupby('zone_id').persons.sum() +See :ref:`dframe-explorer` for detailed information on how to call the ``start`` method and what queries the website is performing. -This computes the sum of persons in each household by zone, or more simply, the population of each zone. If the aggregation is changed to mean, the service will run: :: +Once the ``start``method has been called, the IPython Notebook is running a web service which will respond to queries from a web browser. Try is out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook. Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook. Your web browser should show a page like the following: - households.groupby('zone_id').persons.mean() +.. image:: screenshots/dframe_explorer_screenshot.png -What does this compute exactly? It computes the average number of persons per household in each zone, or the average household size by zone. +See :ref:`dframe-explorer-website` for a description of how to use the website that is rendered. -Because this is serving these queries directly from the IPython Notebook, you can execute some part of a data processing workflow, then run ``dframe_explorer`` and look at the results. If something needs modification, simply hit the ``interrupt kernel`` menu item in the IPython Notebook. You can now execute more Notebook cells and return to ``dframe_explorer`` at any time by running the appropraite cell again. Now map exploration is simply another interactive step in your data processing workflow. +Because the web service is serving these queries directly from the IPython Notebook, you can execute some part of a data processing workflow, then run ``dframe_explorer`` and look at the results. If something needs modification, simply hit the ``interrupt kernel`` menu item in the IPython Notebook. You can now execute more Notebook cells and return to ``dframe_explorer`` at any time by running the appropriate cell again. Now map exploration is simply another interactive step in your data processing workflow. Specifying Scenario Inputs -------------------------- diff --git a/docs/maps/dframe_explorer.rst b/docs/maps/dframe_explorer.rst deleted file mode 100644 index 98a078d1..00000000 --- a/docs/maps/dframe_explorer.rst +++ /dev/null @@ -1,9 +0,0 @@ -DataFrame Explorer -================== - -API ---- - -.. automodule:: urbansim.maps.dframe_explorer - :members: - :undoc-members: diff --git a/docs/maps/index.rst b/docs/maps/index.rst index b0f84bd8..4edadab9 100644 --- a/docs/maps/index.rst +++ b/docs/maps/index.rst @@ -1,7 +1,60 @@ -Mapping Utilities -================= +.. _dframe-explorer: -.. toctree:: - :maxdepth: 2 +DataFrame Explorer +================== - dframe_explorer +Introduction +------------ + +The DataFrame Explorer is used to create a web service within the IPython +Notebook which responds to queries from a web browser. The REST API is +undocumented as the user does not interact with that API. Simply call the +``start`` method below and then open `http://localhost:8765 +`_ in any web browser. + +See :ref:`exploration-workflow` for sample code from the San Francisco case + study. + +The dframe_explorer takes a dictionary of DataFrames which are joined to a set of shapes for visualization. The most common case is to use a `geojson `_ format shapefile of zones to join to any DataFrame that has a zone_id (the dframe_explorer module does the join for you). Then set the center and zoom level for the map, the name of the geojson shapefile is passed, and the join keys both in the geojson file and the DataFrames. Below is a screenshot of the result as displayed in your web browser. + +.. image:: ../screenshots/dframe_explorer_screenshot.png + +.. _dframe-explorer-website: + +Website Description +------------------- + +Here is what each dropdown on the web page does: + +* The first dropdown gives the names of the DataFames you have passed ``dframe_explorer.start`` +* The second dropdown allows you to choose between each of the columns in the DataFrame with the name from the first dropdown +* The third dropdown selects the color scheme from the `colorbrewer `_ color schemes +* The fourth dropdown sets ``quantile`` and ``equal_interval`` `color schemes `_ +* The fifth dropdown selects the Pandas aggregation method to use +* The sixth dropdown executes the `.query `_ method on the Pandas DataFrame in order to filter the input data +* The seventh dropdown executes the `.eval `_ method on the Pandas DataFrame in order to create simple computed variables that are not already columns on the DataFrame. + +So what is this doing? The web service is translating the drop downs to a simple interactive Pandas statement, for example: :: + + df.groupby('zone_id')['residential_units'].sum() + +The web service will print out each statement it executes. The website then transparently joins the output Pandas series to the shapes and create an interactive *slippy* web map using the `Leaflet `_ Javasript library. The code for this map is really `quite simple `_ - feel free to browse the code and add functionality as required. + +To be clear, the website is performing a Pandas aggregation on the fly. If you have a buildings DataFrame with millions of records, Pandas will ``groupby`` the ``zone_id`` and perform an aggregation of your choice. This is designed to give you a quickly navigable map interface to understand the underlying disaggregate data, similar to that supplied by commercial projects such as `Tableau `_. + +As a concrete example, note that the ``households`` table has a ``zone_id`` and is thus available for aggregation in ``dframe_explorer``. Since the web service is running aggregations on the *disaggregate* data, clicking to the ``households`` table and ``persons`` attribute and an aggregation of ``sum`` will run: :: + + households.groupby('zone_id').persons.sum() + +This computes the sum of persons in each household by zone, or more simply, the population of each zone. If the aggregation is changed to mean, the service will run: :: + + households.groupby('zone_id').persons.mean() + +What does this compute exactly? It computes the average number of persons per household in each zone, or the average household size by zone. + +DataFrame Explorer API +---------------------- + +.. automodule:: urbansim.maps.dframe_explorer + :members: start + :undoc-members: diff --git a/urbansim/maps/dframe_explorer.py b/urbansim/maps/dframe_explorer.py index 3c98725e..a8fd0be8 100644 --- a/urbansim/maps/dframe_explorer.py +++ b/urbansim/maps/dframe_explorer.py @@ -73,6 +73,43 @@ def start(views, port=8765, host='localhost', testing=False): + """ + Start the web service which serves the Pandas queries and generates the + HTML for the map. You will need to open a web browser and navigate to + http://localhost:8765 (or the specified port) + + Parameters + ---------- + views : Python dictionary + This is the data that will be displayed in the maps. Keys are strings + (table names) and values are dataframes. Each data frame should have a + column with the name specified as join_name below + center : a Python list with two floats + The initial latitude and longitude of the center of the map + zoom : int + The initial zoom level of the map + shape_json : str + The path to the geojson file which contains that shapes that will be + displayed + geom_name : str + The field name from the JSON file which contains the id of the geometry + join_name : str + The column name from the dataframes passed as views (must be in each + view) which joins to geom_name in the shapes + precision : int + The precision of values to show in the legend on the map + port : int + The port for the web service to respond on + host : str + The hostname to run the web service from + testing : bool + Whether to print extra debug information + + Returns + ------- + Does not return - takes over control of the thread and responds to + queries from a web browser + """ global DFRAMES, CONFIG DFRAMES = {str(k): views[k] for k in views} From 042284d17e51767dad8543c57be5c8dda27236b6 Mon Sep 17 00:00:00 2001 From: Fletcher Foti Date: Wed, 13 Aug 2014 14:31:47 -0700 Subject: [PATCH 2/5] sqftproforma is now documented --- docs/developer/developer.rst | 9 --- docs/developer/index.rst | 119 ++++++++++++++++++++++++++++- docs/developer/sqftproforma.rst | 9 --- urbansim/developer/developer.py | 34 +++++++-- urbansim/developer/sqftproforma.py | 28 ++++--- 5 files changed, 157 insertions(+), 42 deletions(-) delete mode 100644 docs/developer/developer.rst delete mode 100644 docs/developer/sqftproforma.rst diff --git a/docs/developer/developer.rst b/docs/developer/developer.rst deleted file mode 100644 index 3b6f39b5..00000000 --- a/docs/developer/developer.rst +++ /dev/null @@ -1,9 +0,0 @@ -Developer -========= - -API ---- - -.. automodule:: urbansim.developer.developer - :members: - :undoc-members: diff --git a/docs/developer/index.rst b/docs/developer/index.rst index fb741071..ae5617d9 100644 --- a/docs/developer/index.rst +++ b/docs/developer/index.rst @@ -1,8 +1,123 @@ Real Estate Development Models ============================== +The real estate development models included in this module are designed to +implement pencil out pro formas, which generally measure the cash inflows and +outflows of a potential investment (in this case, real estate development) +with the outcome being some measure of profitability or return on investment. +Pro formas would normally be performed in a spreadsheet program (e.g. Excel), +but are implemented in vectorized Python implementations so that many (think +millions) of pro formas can be performed at a time. + +The functionality is split into two modules - the square foot pro forma and +the developer model - as there are many use cases that call for the pro formas +without the developer model. The ``sqftproforma`` module computes real +estate feasibility for a set of parcels dependent on allowed uses, prices, +and building costs, but does not actually *build* anything (both figuratively +and literally). The ``developer model`` decides how much to build, +then picks among the set of feasible buildings attempting to meet demand, +and adds the new buildings to the set of current buildings. Thus +``developer model`` is primarily useful in the context of an urban forecast. + +An example of the sample code required to generate the set of feasible +buildings is shown below. This code comes from the ``utils`` module of the +current `sanfran_urbansim `_ demo. Notice that the SqFtProForma is +first initialized and a DataFrame of parcels is tested for feasibliity (each +individual parcel is tested for feasibility). Each *use* (e.g. retail, office, +residential, etc) is assigned a price per parcel, typically from empirical data +of currents rents and prices in the city but can be the result of forecast +rents and prices as well. The ``lookup`` function is then called with a +specific building ``form`` and the pro forma returns whether that form is +profitable for each parcel. + +A large number of assumptions enter in to the computation of profitability +and these are set in the `SqFtProFormaConfig <#urbansim.developer.sqftproforma.SqFtProFormaConfig>`_ module, and include such things +as the set of ``uses`` to model, the mix of ``uses`` into ``forms``, +the impact of parking requirements, parking costs, +building costs at different heights (taller buildings typically requiring +more expensive construction methods), the profit ratio required, +the building efficiency, parcel coverage, and cap rate to name a few. See +the API documentation for the complete list and detailed descriptions. + +Note that unit mixes don't typically enter in to the square foot pro forma +(hence the name). After discussions with numerous real estate developers, +we found that most developers thought first and foremost in terms of price and +cost per square foot and the arbitrage between, and second in terms of the +translation to unit sizes and mixes in a given market (also larger and +smaller units of a given unit type will typically lower and raise their +prices as stands to reason). Since getting data on unit mixes in the current +building stock is extremely difficult, most feasibility computations here +happen on a square foot basis and the ``developer`` model below handles the +translation to units. :: + + pf = sqftproforma.SqFtProForma() + + df = parcels.to_frame() + + # add prices for each use + for use in pf.config.uses: + df[use] = parcel_price_callback(use) + + # convert from cost to yearly rent + if residential_to_yearly: + df["residential"] *= pf.config.cap_rate + + d = {} + for form in pf.config.forms: + print "Computing feasibility for form %s" % form + d[form] = pf.lookup(form, df[parcel_use_allowed_callback(form)]) + + far_predictions = pd.concat(d.values(), keys=d.keys(), axis=1) + + sim.add_table("feasibility", far_predictions) + + +The ``developer model`` is responsible for picking among feasible buildings +in order to meet demand. It provides a simple utility to compute the number +of units (or amount of floorspace) to build :: + + dev = developer.Developer(feasibility.to_frame()) + + target_units = dev.\ + compute_units_to_build(len(agents), + buildings[supply_fname].sum(), + target_vacancy) + + new_buildings = dev.pick(forms, + target_units, + parcel_size, + ave_unit_size, + total_units, + max_parcel_size=max_parcel_size, + drop_after_build=True, + residential=residential, + bldg_sqft_per_job=bldg_sqft_per_job) + + if year is not None: + new_buildings["year_built"] = year + + if form_to_btype_callback is not None: + new_buildings["building_type_id"] = new_buildings["form"].\ + apply(form_to_btype_callback) + + all_buildings = dev.merge(buildings.to_frame(buildings.local_columns), + new_buildings[buildings.local_columns]) + + sim.add_table("buildings", all_buildings) + .. toctree:: :maxdepth: 2 - developer - sqftproforma + +Square Foot Pro Forma API +~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. automodule:: urbansim.developer.sqftproforma + :members: + +Developer Model API +~~~~~~~~~~~~~~~~~~~ + +.. automodule:: urbansim.developer.developer + :members: diff --git a/docs/developer/sqftproforma.rst b/docs/developer/sqftproforma.rst deleted file mode 100644 index 7b0e7a56..00000000 --- a/docs/developer/sqftproforma.rst +++ /dev/null @@ -1,9 +0,0 @@ -Square-Foot Proforma -==================== - -API ---- - -.. automodule:: urbansim.developer.sqftproforma - :members: - :undoc-members: diff --git a/urbansim/developer/developer.py b/urbansim/developer/developer.py index 756ad3e4..9ed3ccd5 100644 --- a/urbansim/developer/developer.py +++ b/urbansim/developer/developer.py @@ -17,7 +17,7 @@ def __init__(self, feasibility): self.feasibility = feasibility @staticmethod - def max_form(f, colname): + def _max_form(f, colname): """ Assumes dataframe with hierarchical columns with first index equal to the use and second index equal to the attribute. @@ -36,7 +36,6 @@ def max_form(f, colname): max_profit max_profit_far total_cost - """ df = f.stack(level=0)[[colname]].stack().unstack(level=1).reset_index(level=1, drop=True) return df.idxmax(axis=1) @@ -52,13 +51,17 @@ def keep_form_with_max_profit(self, forms=None): forms: list of strings List of forms which compete which other. Can leave some out. + Returns + ------- + Nothing. Goes from a multi-index to a single index with only the + most profitable form. """ f = self.feasibility if forms is not None: f = f[forms] - mu = self.max_form(f, "max_profit") + mu = self._max_form(f, "max_profit") indexes = [tuple(x) for x in mu.reset_index().values] df = f.stack(level=0).loc[indexes] df.index.names = ["parcel_id", "form"] @@ -81,7 +84,7 @@ def compute_units_to_build(num_agents, num_units, target_vacancy): Returns ------- - int + number_of_units : int the number of units that need to be built """ print "Number of agents: {:,}".format(num_agents) @@ -137,9 +140,14 @@ def pick(self, form, target_units, parcel_size, ave_unit_size, have been chosen for development. Usually this is true so as to not develop the same parcel twice. residential: bool - If creating non-residential buildings set this to false and developer - will fill in job_spaces rather than residential_units + If creating non-residential buildings set this to false and + developer will fill in job_spaces rather than residential_units + Returns + ------- + new_buildings : dataframe + DataFrame of buildings to add. These buildings are rows from the + DataFrame that is returned from feasibility. """ if isinstance(form, list): @@ -172,7 +180,8 @@ def pick(self, form, target_units, parcel_size, ave_unit_size, print "Sum of net units that are profitable: {:,}".\ format(int(df.net_units.sum())) if df.net_units.sum() < target_units: - print "WARNING THERE WERE NOT ENOUGH PROFITABLE UNITS TO MATCH DEMAND" + print "WARNING THERE WERE NOT ENOUGH PROFITABLE UNITS TO " \ + "MATCH DEMAND" choices = np.random.choice(df.index.values, size=len(df.index), replace=False, @@ -199,6 +208,17 @@ def merge(old_df, new_df): usually the buildings dataset and the new dataframe is a modified (by the user) version of what is returned by the pick method. + Parameters + ---------- + old_df : dataframe + Current set of buildings + new_df : dataframe + New buildings to add, usually comes from this module + + Returns + ------- + df : dataframe + Combined DataFrame of buildings, makes sure indexes don't overlap """ maxind = np.max(old_df.index.values) new_df = new_df.reset_index(drop=True) diff --git a/urbansim/developer/sqftproforma.py b/urbansim/developer/sqftproforma.py index 4c34c11c..9d16cc5b 100644 --- a/urbansim/developer/sqftproforma.py +++ b/urbansim/developer/sqftproforma.py @@ -11,8 +11,6 @@ class SqFtProFormaConfig(object): This class encapsulates the configuration options for the square foot based pro forma. - Attributes - ---------- parcel_sizes : list A list of parcel sizes to test. Interestingly, right now the parcel sizes cancel is this style of pro forma computation so @@ -429,10 +427,11 @@ def get_debug_info(self, form, parking_config): Returns ------- - A dataframe where the index is the far with many columns representing - intermediate steps in the pro forma computation. Additional documentation - will be added at a later date, although many of the columns should be fairly - self-expanatory. + debug_info : dataframe + A dataframe where the index is the far with many columns + representing intermediate steps in the pro forma computation. + Additional documentation will be added at a later date, although + many of the columns should be fairly self-expanatory. """ return self.dev_d[(form, parking_config)] @@ -444,14 +443,15 @@ def get_ave_cost_sqft(self, form): Parameters ---------- form : string - Get a series representing the average cost per sqft for each form in the - config + Get a series representing the average cost per sqft for each form in + the config Returns ------- - A series where the index is the far and the values are the average cost per - sqft at which the building is "break even" given the configuration parameters - that were passed at run time. + cost : series + A series where the index is the far and the values are the average + cost per sqft at which the building is "break even" given the + configuration parameters that were passed at run time. """ return self.min_ave_cost_d[form] @@ -474,7 +474,6 @@ def lookup(self, form, df, only_built=True): or when debugging) Input Dataframe Columns - ------- rent : dataframe A set of columns, one for each of the uses passed in the configuration. Values are yearly rents for that use. Typical column names would be @@ -503,9 +502,8 @@ def lookup(self, form, df, only_built=True): Returns ------- - A dataframe which is indexed by the parcel ids that were passed, with the - following columns. - + index : Series, int + parcel identifiers building_sqft : Series, float The number of square feet for the building to build. Keep in mind this includes parking and common space. Will need a helpful function From cddbb24f5b5a3450b41e623a46a2ce8ba4d5883f Mon Sep 17 00:00:00 2001 From: Fletcher Foti Date: Wed, 13 Aug 2014 14:40:21 -0700 Subject: [PATCH 3/5] developer docs complete --- docs/developer/index.rst | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/docs/developer/index.rst b/docs/developer/index.rst index ae5617d9..7ac3b48f 100644 --- a/docs/developer/index.rst +++ b/docs/developer/index.rst @@ -74,8 +74,25 @@ translation to units. :: The ``developer model`` is responsible for picking among feasible buildings -in order to meet demand. It provides a simple utility to compute the number -of units (or amount of floorspace) to build :: +in order to meet demand. An example usage of the model is shown below - which +is also lifted form the `sanfran_urbansim `_ demo. + +This module provides a simple utility to compute the number of units (or +amount of floorspace) to build. Although the vacancy rate *can* be applied +at the regional level, it can also be used to meet vacancy rates at a +sub-regional level. The developer model itself is agnostic to which parcels +the user passes it, and the user is responsible for knowing at which level of +geography demand is assumed to operate. The developer model then chooses +which buildings to "build," usually as a random choice weighted by profitability. +This means more profitable buildings are more likely to be built although +the results are a bit stochastic. + +The only remaining steps are then "bookkeeping" in the sense that some +additional fields might need to be added (``year_built`` or a conversion from +developer ``forms`` to ``building_type_ids``). Finally the new buildings +and old buildings need to be merged in such a way that the old ids are +preserved and not duplicated (new ids are assigned at the max of the old +ids+1 and then incremented from there). :: dev = developer.Developer(feasibility.to_frame()) From 222b7720cc5dc297b0ccb40fb90833f6a576bb7b Mon Sep 17 00:00:00 2001 From: Fletcher Foti Date: Wed, 13 Aug 2014 15:32:49 -0700 Subject: [PATCH 4/5] model implementation choices finished --- docs/examples.rst | 82 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 3 deletions(-) diff --git a/docs/examples.rst b/docs/examples.rst index 2830e53e..120f4722 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -253,17 +253,93 @@ Fees and Subsidies Model Implementation Choices ---------------------------- -UrbanAccess or Zones -~~~~~~~~~~~~~~~~~~~~ +There are a number of model implementation choices that can be made in +implementing an UrbanSim regional forecasting tool, and this will describe a +few of the possibilities. There is definitely a set of best practices +though, so shoot us an email if you want more detail. Geographic Detail ~~~~~~~~~~~~~~~~~ +Although zone or block-level models can be done (and gridcells have been used +historically), at this point the geographic detail is typically at the parcel or +building level. If good information is available for individual units, +this level or detail is actually ideal. + +Most household and employment location choices choose building_ids at this +point, and the number of available units is measured as the supply of +units / job_spaces in the building minus the number of households / jobs in the +building. + +UrbanAccess or Zones +~~~~~~~~~~~~~~~~~~~~ + +It is fairly standard to combine the buildings from the locations discussed +above with some measure of the neighborhood around each building. The simplest +implementation of this idea is used in the sanfran_example - and is typical of +traditional GIS - which is to use aggregations within some higher level polygon. +In the most common case, the region has zones assigned and every parcel is +assigned a ``zone_id`` (the ``zone_id`` is then available on the other related +tables). Once ``zone_ids`` are available, vanilla Pandas is usable and GIS +is not strictly required. + +Although this is the easiest implementation method, a pedestrian-scale +network-based method is perhaps more appropriate when analyses are happening +at the parcel- and building-scale and this is the exactly the intended purpose +of the `urbanaccess `_ framework. +Most full UrbanSim implementations now use aggregations along the local street +network, and ``urbanaccess`` will be released as an official product by the +end of 2014. + +Jobs or Establishments +~~~~~~~~~~~~~~~~~~~~~~ + +Jobs by sector is often the unit of analysis for the non-residential side, +as this kind of model is completely analagous to the residential side and is +perhaps the easiest to understand. In some cases establishments can be used +instead of jobs to capture different behavior of different size +establishments, but fitting establishments into buildings then becomes a +tricky endeavor (and modeling the movements of large employers should not +really be part of the scope of the model system). + Configuration of Models ~~~~~~~~~~~~~~~~~~~~~~~ +Some choices need to made on the configuration of models. For instance, +is there a single hedonic for residential sales price or is there a second +model for rent? Is non-residential rent segmented by building type? How many +different uses are there in the pro forma and what forms (mixes of uses) will be +tested. The simplest model configuration is shown in the sanfran_urbansim +example, and additional behavior can be captured to answer specific research +questions. + Dealing with NaNs ~~~~~~~~~~~~~~~~~ +There is not a standard method for dealing with NaNs (typically indicating +missing data) within UrbanSim, but there is a good convention that can be +used. First an injectable can be set with an object in this form (make sure +to set the name appropriately): :: + + sim.add_injectable("fillna_config", { + "buildings": { + "residential_sales_price": ("zero", "int"), + "non_residential_rent": ("zero", "int"), + "residential_units": ("zero", "int"), + "non_residential_sqft": ("zero", "int"), + "year_built": ("median", "int"), + "building_type_id": ("mode", "int") + }, + "jobs": { + "job_category": ("mode", "str"), + } + }) - +The keys in this object are table names, the values are also dictionary +where the keys are column names and the values are a tuple. The first value +of the tuple is what to call the Pandas ``fillna`` function with, +and can be a choice of "zero," "median," or "mode" and should be set +appropriately by the user for the specific column. The second argument is +the data type to conver to. The user can then call +``utils.fill_na_from_config`` as in the `example `_ with a DataFrame and table name and all NaNs will be filled. This +functionality will eventually be moved into UrbanSim. \ No newline at end of file From e798c694f2fc01209436bb9591dfa28fd23ded82 Mon Sep 17 00:00:00 2001 From: Fletcher Foti Date: Wed, 13 Aug 2014 16:15:48 -0700 Subject: [PATCH 5/5] scenario input documentation --- docs/examples.rst | 12 ----------- docs/gettingstarted.rst | 47 +++++++++++++++++++++++++++++++++++++++++ docs/maps/index.rst | 3 +++ 3 files changed, 50 insertions(+), 12 deletions(-) diff --git a/docs/examples.rst b/docs/examples.rst index 120f4722..d1ad575a 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -238,18 +238,6 @@ See :ref:`dframe-explorer-website` for a description of how to use the website t Because the web service is serving these queries directly from the IPython Notebook, you can execute some part of a data processing workflow, then run ``dframe_explorer`` and look at the results. If something needs modification, simply hit the ``interrupt kernel`` menu item in the IPython Notebook. You can now execute more Notebook cells and return to ``dframe_explorer`` at any time by running the appropriate cell again. Now map exploration is simply another interactive step in your data processing workflow. -Specifying Scenario Inputs --------------------------- - -Control Totals -~~~~~~~~~~~~~~ - -Zoning Changes -~~~~~~~~~~~~~~ - -Fees and Subsidies -~~~~~~~~~~~~~~~~~~ - Model Implementation Choices ---------------------------- diff --git a/docs/gettingstarted.rst b/docs/gettingstarted.rst index 5e7d2f62..87623c63 100644 --- a/docs/gettingstarted.rst +++ b/docs/gettingstarted.rst @@ -129,6 +129,53 @@ It should be noted that many other kinds of models can be included in the simula In general, any Python script that reads and writes data can be included to help answer a specific research question or to model a certain real-world behavior - models can even be parameterized in JSON or YAML and included in the standard model set and an ever-increasing set of functionality will be added over time. +Specifying Scenario Inputs +-------------------------- + +Although UrbanSim is designed to model real estate markets, +the *raison d'etre* of UrbanSim is as a scenario planning tool. Regional or +city planners want to understand how their cities will develop in the +presence or absence of different policies or in the context of different +assumptions that they have little or no control over, like economic growth or +migration of households. + +In a sense, this style of regional modeling is kind of like retirement +planning, but for cities - will there be enough room for all the households and +jobs if the city grows by 3% every year? What if it grows by 5%? 10%? +If current zoning policies don't appropriately accommodate that growth, +it's likely that prices will rise, but by how much? If growth is pushed to +different parts of the region, will there be environmental impacts or an +inefficient transportation network that increases traffic, travel times, +and infrastructure costs? What will the resulting urban form look like? +Sprawl, Manhattan, or something in between? + +UrbanSim is designed to investigate these questions, and other questions like +them, and to allow outcomes to be analyzed as assumptions are changed. These +assumptions can include, but are not limited to the following. + +* *Control Totals* specify in a `simple Excel-based format + `_ the basic assumptions + on demographic shifts of households and of sector shifts of employment. + These files control the transition models and which new households and jobs + are added to the simulation. + +* *Zoning Changes* in the form of scenario-specific density limits such as + ``max_far`` and ``max_dua`` are `passed to the pro formas `_ + when testing for feasibility. Simple `utility functions `_ are also common to + *upzone* certain parcels only if certain policies affect them. + +* *Fees and Subsidies* may also come in to play by adjusting the feasibility + of buildings that are market-rate infeasible. Fees can also be collected on + profitable buildings and transferred to less profitable buildings, + as with affordable housing policies. + +* *Developer Assumptions* can also be tested, like interest rates, + the impact of mixed use buildings on feasibility, of density bonuses for + neighborhood amenities, and of lowering or raising parking requirements. + Taking the Next Step -------------------- diff --git a/docs/maps/index.rst b/docs/maps/index.rst index 4edadab9..af37ec7d 100644 --- a/docs/maps/index.rst +++ b/docs/maps/index.rst @@ -34,6 +34,9 @@ Here is what each dropdown on the web page does: * The sixth dropdown executes the `.query `_ method on the Pandas DataFrame in order to filter the input data * The seventh dropdown executes the `.eval `_ method on the Pandas DataFrame in order to create simple computed variables that are not already columns on the DataFrame. +What's it Doing Exactly? +------------------------ + So what is this doing? The web service is translating the drop downs to a simple interactive Pandas statement, for example: :: df.groupby('zone_id')['residential_units'].sum()