Skip to content
treilly edited this page Sep 12, 2014 · 5 revisions

Things you'd like to see

  • What is data?
    • Not just columns, datatypes:
      • Numeric - Integer, Float
      • Text
      • Boolean
      • Time - Date, Datetime
      • Factor
  • Structured versus Unstructured data - Degrees of messiness
    • Usually unstructured means using code to extract typed fields of data
  • Where does data live? Datastores
    • Files - .csv, .xls, .txt
    • Databases - Oracle, MySQL, Postgresql
    • "Big Data"
      • NoSQL - Cassandra
      • Hadoop
        • Things on top of Hadoop - Hive, Spark, etc.
  • Why Databases?
    • Control - Permissions
    • Centralization
    • Transactions
    • Backup
  • Big Data
    • What is it?
    • Why?
  • How to work with data?
    • Data in - Ingest of one of the above
      • Read from files
      • Use SQL
      • Use APIs
    • Data transformation
      • Feature extraction
      • Aggregation
    • Data out - graphs, etc.
      • ggplot2, Matplotlib, Tableau, d3.js
Clone this wiki locally