Merge pull request #431 from olivito/patch-2

update docs for uploading pandas dataframe as Tamr dataset
Datatamer · Aug 7, 2020 · 0a1b54d · 0a1b54d
2 parents 8eebaa7 + 65cd493
commit 0a1b54d
Showing 1 changed file with 10 additions and 3 deletions.
diff --git a/docs/user-guide/pandas.md b/docs/user-guide/pandas.md
@@ -97,20 +97,27 @@ stored in the removed attributes.
 ## Upload Dataframe as Dataset
 
 ### Create New Dataset
-To create a new dataset and upload data, the convenience function `dataset.create_from_dataframe()` can be used. 
+To create a new dataset and upload data, the convenience function `datasets.create_from_dataframe()` can be used. 
 Note that Tamr will throw an error if columns aren't generally formatted as strings. (The exception being geospatial
 columns. For that, see the geospatial examples.)
 
-In order to achieve this, the following code will transform the column types to string.
+To format values as strings while preserving null information, specify `dtype=object` when creating a dataframe from a csv file.
 ```python
-df = df.astype(str)
+df = pd.read_csv("my_file.csv", dtype=object)
 ```
 
 Creating the dataset is as easy as calling:
 ```python
 tamr.datasets.create_from_dataframe(df, 'primaryKey', 'my_new_dataset')
 ```
 
+For an already-existing dataframe, the columns can be converted to strings using:
+```python
+df = df.astype(str)
+```
+Note, however, that converting this way will cause any `NaN` or `None` values to become strings like `'nan'` 
+that will persist into the created Tamr dataset.
+
 ### Changing Values
 
 #### Making Changes: In Memory