Skip to content

Latest commit

 

History

History
58 lines (44 loc) · 4.22 KB

File metadata and controls

58 lines (44 loc) · 4.22 KB

Samples for Azure Databricks Orientation

This is samples code repository (Python) for Azure Databricks Orientation. It's covered various useful usage scenario from beginner to intermediate level.

Section 1

Section 2

  • Mount Azure Blob Storage
  • Exploring sample data (json) in Azure Blob Storage with Json and Pandas
  • Flatten first level of nested columns data
  • Flatten second level of nested columns data
  • Plotting columns relationship by Seaborn

Section 3

Section 4

  • Exploring sample data (csv) in ADLS with Pandas
  • Data cleaning with Pandas
  • Saving cleaned data back to ADLS

Section 5

  • Data cleaning and preparation with PySpark

List of Files

  • data/ > sample source data directory
  • data/pima-indians-diabetes-data.csv > Pima Indians Diabetes Database in csv
  • data/pima-indians-diabetes-data-2.csv > Pima Indians Diabetes Database in csv with column header
  • data/raw_nyc_phil.json > New York Philharmonic Performance History in json
  • data/BL-Flickr-Images-Book.csv > Sample csv data for data cleaning
  • Samples_for_Orientation_MASKED.ipynb > Exported Notebook from Azure Databricks (for Section 1 to 3)
  • Samples_for_Orientation_MASKED.html > Exported HTML (with result and visual) from Azure Databricks (for Section 1 to 3)
  • Samples_for_Orientation_2_MASKED.ipynb > Exported Notebook from Azure Databricks (for Section 4)
  • Samples_for_Orientation_2_MASKED.html > Exported HTML (with result and visual) from Azure Databricks (for Section 4)
  • Data_Cleansing_and_Preparation_with_PySpark_MASKED.ipynb > Exported Notebook from Azure Databricks (for Section 5)
  • Data_Cleansing_and_Preparation_with_PySpark_MASKED.html > Exported HTML (with result and visual) from Azure Databricks (for Section 5)

capture1