Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions bag.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
"# Dask Bags\n",
"\n",
"\n",
"Dask Bag implements operations like `map`, `filter`, `groupby` and aggregations on collections of Python objects. It does this in parallel and in small memory using Python iterators. It is similar to a parallel version of itertools or a Pythonic version of the PySpark RDD.\n",
"Dask Bag implements operations like `map`, `filter`, `groupby` and aggregations on collections of Python objects. It does this in parallel and in small memory footprint using Python iterators. It is similar to a parallel version of itertools or a Pythonic version of the PySpark RDD.\n",
"\n",
"Dask Bags are often used to do simple preprocessing on log files, JSON records, or other user defined Python objects.\n",
"Dask Bags are often used to do simple preprocessing on log files, JSON records, or other user-defined Python objects.\n",
"\n",
"Full API documentation is available here: http://docs.dask.org/en/latest/bag-api.html"
]
Expand All @@ -23,7 +23,7 @@
"Starting the Dask Client is optional. It will provide a dashboard which \n",
"is useful to gain insight on the computation. \n",
"\n",
"The link to the dashboard will become visible when you create the client below. We recommend having it open on one side of your screen while using your notebook on the other side. This can take some effort to arrange your windows, but seeing them both at the same is very useful when learning."
"The link to the dashboard will become visible when you create the client below. We recommend having it open on one side of your screen while using your notebook on the other side. This can take some effort to arrange your windows, but seeing them both at the same time is very useful when learning."
]
},
{
Expand Down Expand Up @@ -68,7 +68,7 @@
"source": [
"## Read JSON data\n",
"\n",
"Now that we have some JSON data in a file lets take a look at it with Dask Bag and Python JSON module."
"Now that we have some JSON data in a file let's take a look at it with Dask Bag and Python JSON module."
]
},
{
Expand Down Expand Up @@ -204,7 +204,7 @@
"\n",
"Dask Bags are good for reading in initial data, doing a bit of pre-processing, and then handing off to some other more efficient form like Dask Dataframes. Dask Dataframes use Pandas internally, and so can be much faster on numeric data and also have more complex algorithms. \n",
"\n",
"However, Dask Dataframes also expect data that is organized as flat columns. It does not support nested JSON data very well (Bag is better for this).\n",
"However, Dask Dataframes also expect data that is organized as flat columns. It does not support nested JSON data very well (Bag is better for this). For deeply nested data, consider flattening or using Bag first, then convert to DataFrame\n",
"\n",
"Here we make a function to flatten down our nested data structure, map that across our records, and then convert that to a Dask Dataframe."
]
Expand Down Expand Up @@ -295,7 +295,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
Loading