Skip to content

Commit

Permalink
Fix docs typos (#19)
Browse files Browse the repository at this point in the history
  • Loading branch information
Seth-Rothschild authored and kmax12 committed Oct 23, 2017
1 parent ddba9c3 commit fb87937
Show file tree
Hide file tree
Showing 5 changed files with 7 additions and 7 deletions.
4 changes: 2 additions & 2 deletions docs/source/automated_feature_engineering/afe.rst
Expand Up @@ -30,7 +30,7 @@ Running DFS

Typically, without automated feature engineering, a data scientist would write code to aggregate data for a customer, and apply different statistical functions resulting in features quantifying the customer's behavior. In this example, an expert might be interested in features such as: `total number of sessions` or `month the customer signed up`.

These features can generated by DFS when we specify the target_entity as ``customers`` and ``Count`` and ``Month`` as primitives.
These features can be generated by DFS when we specify the target_entity as ``customers`` and ``Count`` and ``Month`` as primitives.


.. ipython:: python
Expand Down Expand Up @@ -98,7 +98,7 @@ Stacking results in features that are more expressive than individual primitives
Changing Target Entity
**********************

DFS is powerful because we can create a feature matrix for any entity in our dataset. If we switch our target entity to "sessions", we can synthesize features for each session instead of each customer. Now, we can use these features to predict an the outcome of a session.
DFS is powerful because we can create a feature matrix for any entity in our dataset. If we switch our target entity to "sessions", we can synthesize features for each session instead of each customer. Now, we can use these features to predict the outcome of a session.

.. ipython:: python
Expand Down
Expand Up @@ -19,7 +19,7 @@ Featuretools is designed to take time into consideration when required. By speci
**Motivating Example**

Consider the problem to predict if a customer is likely to buy an upgrade to their membership plan. To do this, you first identify historical examples of customer who upgraded and others who did not. For each customer, you can only use the interactions s/he had prior to upgrading or not upgrading their membership. This is a requirement -- by definition.
Consider the problem to predict if a customer is likely to buy an upgrade to their membership plan. To do this, you first identify historical examples of customers who upgraded and others who did not. For each customer, you can only use the interactions s/he had prior to upgrading or not upgrading their membership. This is a requirement -- by definition.

The example above illustrates the importance of time in calculating features. Other situations are more subtle, and hence when building predictive models it is important identify if time is a consideration. If feature calculation does not account for time, it may include data in calculations that is past the outcome we want to predict and may cause the well known problem of *Label Leakage*.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/guides/tuning_dfs.rst
Expand Up @@ -86,7 +86,7 @@ Machine learning algorithms typically expect all numeric data. When Deep Feature
feature_matrix
This feature matrix contains 2 categorical variables, ``zip_code`` and ``MODE(sessions.device)``. We can use the feature matrix and feature definitions to encode these categorical values. Featuretools overs functionality to apply one hot encoding to the output of DFS.
This feature matrix contains 2 categorical variables, ``zip_code`` and ``MODE(sessions.device)``. We can use the feature matrix and feature definitions to encode these categorical values. Featuretools offers functionality to apply one hot encoding to the output of DFS.

.. ipython:: python
Expand Down
4 changes: 2 additions & 2 deletions docs/source/loading_data/using_entitysets.rst
Expand Up @@ -4,7 +4,7 @@ Representing Data with EntitySets
=================================
.. currentmodule:: featuretools

An ``EntitySet`` is a collection of entities and the relationships between them relationships. They are useful for preparing raw, structured datasets for feature engineering. While many functions in Featuretools take ``entities`` and ``relationships`` as separate arguments, it is recommended to create an ``EntitySet``, so you can more easily manipulate your data as needed.
An ``EntitySet`` is a collection of entities and the relationships between them. They are useful for preparing raw, structured datasets for feature engineering. While many functions in Featuretools take ``entities`` and ``relationships`` as separate arguments, it is recommended to create an ``EntitySet``, so you can more easily manipulate your data as needed.


.. ipython:: python
Expand Down Expand Up @@ -86,7 +86,7 @@ With two entities in our entity set, we can add a relationship between them.

Adding a Relationship
~~~~~~~~~~~~~~~~~~~~~
We want to relate these two entities by the columns called "product_id" in each entity. Each product as multiple transactions associated with it, so it is called it the **parent entity**, while the transactions entity is known as the **child entity**. When specifying relationships we list the variable in the parent entity first.
We want to relate these two entities by the columns called "product_id" in each entity. Each product has multiple transactions associated with it, so it is called it the **parent entity**, while the transactions entity is known as the **child entity**. When specifying relationships we list the variable in the parent entity first.

.. ipython:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage_tips/scaling.rst
Expand Up @@ -16,7 +16,7 @@ When an entire dataset is not required to calculate the features for a given set

Use Spark or Dask to distribute computation
---------------------------------------------
If the data is so big that loading in chunks isn't an option, we can distribute the data and parallelize the computation using frameworks like `Spark <https://spark.apache.org/docs/latest/api/python/index.html>`_ or `Dask <http://dask.pydata.org/en/latest/>`_. Both of these systems support a dataframe interface that can easily be used to partition data as need. Because Featuretools is a python library, it is easy to integrate.
If the data is so big that loading in chunks isn't an option, we can distribute the data and parallelize the computation using frameworks like `Spark <https://spark.apache.org/docs/latest/api/python/index.html>`_ or `Dask <http://dask.pydata.org/en/latest/>`_. Both of these systems support a dataframe interface that can easily be used to partition data as needed. Because Featuretools is a python library, it is easy to integrate.


Feature Labs
Expand Down

0 comments on commit fb87937

Please sign in to comment.