-
Notifications
You must be signed in to change notification settings - Fork 52
0. Suggestions
treilly edited this page Sep 12, 2014
·
5 revisions
Things you'd like to see
- What is data?
- Not just columns, datatypes:
- Numeric - Integer, Float
- Text
- Boolean
- Time - Date, Datetime
- Factor
- Not just columns, datatypes:
- Structured versus Unstructured data - Degrees of messiness
- Usually unstructured means using code to extract typed fields of data
- Where does data live? Datastores
- Files - .csv, .xls, .txt
- Databases - Oracle, MySQL, Postgresql
- "Big Data"
- NoSQL - Cassandra
- Hadoop
- Things on top of Hadoop - Hive, Spark, etc.
- Why Databases?
- Control - Permissions
- Centralization
- Transactions
- Backup
- Big Data
- What is it?
- Why?
- How to work with data?
- Data in - Ingest of one of the above
- Read from files
- Use SQL
- Use APIs
- Data transformation
- Feature extraction
- Aggregation
- Data out - graphs, etc.
- ggplot2, Matplotlib, Tableau, d3.js
- Data in - Ingest of one of the above