Permalink
Browse files

first cut at exported data

  • Loading branch information...
1 parent baa5cbd commit ac679b5dfff6bc1e6c68bcc9ec1d6d75151e4bbc @graysky graysky committed Jan 3, 2012
Showing with 19,469 additions and 2,766 deletions.
  1. +48 −13 README.md
  2. +30 −5 importer.rb
  3. +2,748 −2,748 items.txt
  4. +14,215 −0 reviews.txt
  5. +2,428 −0 screenshots.txt
View
61 README.md
@@ -1,25 +1,60 @@
# oneforty app data
-Future home of open data about 4,000+ social media applications.
+Open data on about 4,000 social media apps and related screenshots and reviews from oneforty.com.
-# License
+
+## License
This data is licensed under the Creative Commons Attribution 3.0 license (http://creativecommons.org/licenses/by/3.0/).
It includes a requirement that
-# Data
-
-- active Apps
--- id
--- name
--- desc
--- tagline
--- url
--- created_at
-
-# Opening in Excel
+## Data and Schema
+
+The following describes the schema for the data files. Each file is tab-delimited and is in utf-8. There is an example importer included that uses Ruby 1.9's CSV library to parse each document.
+
+- items (aka apps)
+-- id - database id of the app.
+-- name - app name.
+-- tagline (optional) - short (140 character) description of the app.
+-- twitter account - if the app registered a related twitter account.
+-- url - URL for the app's homepage.
+-- permalink - generated permalink used as the slug on oneforty's urls.
+-- rank score - an opaque estimate of popularity, ranging from 0.0 - 100.0.
+-- average rating - average rating from users from 0 to 5.0. Note an average rating of 0.0 means no ratings (since the lowest rating is a 1 star)
+-- created at - UTC timestamp when the app was created.
+-- platform1/platform2/platform3 - list of up to 3 platforms (iPhone, Mac, etc). List of platforms is a fixed set.
+-- category1/category2/category3 - list of up to 3 categories (Clients, Analytics, etc). List of categories is a fixed set.
+-- tags - User-generated tags seperated by a comma. Free-form values.
+-- developer name - if known, the name of the app developer
+-- developer twitter - if known, the twitter handle of the app developer
+-- description - long form app description.
+
+Many items have an accompanying icon (or logo) in the images/items directory. They are named like [item id]_[style].png where style is either "thumb" (100x100) or "original" (no fixed dimensions).
+
+- reviews
+-- id - database id of the review.
+-- item_id - id of the reviewed app.
+-- reviewer name - name of the reviewer.
+-- reviewer twitter - twitter handle of the reviewer
+-- rating - rating from 1 to 5. Note: rating is optional.
+-- quality score - higher score indicates it was valuable to other users. A score of 0 is neutral. Many spammy reviews have been removed already.
+-- created at - UTC timestamp.
+-- review - long form body of the review.
+
+- screenshots - developer or user supplied app screenshots.
+-- id - database id of the screenshot
+-- item_id - id of the screenshoted app.
+-- title - caption for the screenshot.
+-- content type - content type of the screenshot (ex image/png, image/jpeg)
+-- original file name - file name of the "original"-sized screenshot in images/screenshots
+-- thumb file name - file name of the "thumb"-sized (100x100) screenshot in images/screenshots
+-- large file name - file name of the "large"-sized (400x400) screenshot in images/screenshots
+-- created at - UTC timestamp.
+
+
+### Opening in Excel
1. Import the .txt file into Excel
2. In Text Import Wizard, on 1st screen choose "Delimited"
View
35 importer.rb
@@ -10,20 +10,45 @@ def items
i = 0
CSV.foreach("items.txt", csv_options) do |row|
h = row.to_hash
- puts h.inspect
+ #puts h.inspect
i += 1
end
- puts "Imported #{i} entries."
+ puts "Processed #{i} items."
end
+
+ def reviews
+ i = 0
+ CSV.foreach("reviews.txt", csv_options) do |row|
+ h = row.to_hash
+ #puts h.inspect
+ i += 1
+ end
+
+ puts "Processed #{i} reviews."
+ end
+
+ def screenshots
+ i = 0
+ CSV.foreach("screenshots.txt", csv_options) do |row|
+ h = row.to_hash
+ #puts h.inspect
+ i += 1
+ end
+ puts "Processed #{i} screenshots."
+ end
+
protected
def csv_options
- {:col_sep => "\t", :headers => :first_row, :quote_char=>'"', :skip_blanks => true, :encoding => "u"}
+ # Tab delimited
+ {:col_sep => "\t", :headers => :first_row, :quote_char=>'"', :skip_blanks => true, :encoding => "utf-8"}
end
end
-# Run the importer
-Importer.new.items()
+# Run the importers
+Importer.new.items()
+Importer.new.reviews()
+Importer.new.screenshots()
View
5,496 items.txt
2,748 additions, 2,748 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
View
14,215 reviews.txt
14,215 additions, 0 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
View
2,428 screenshots.txt
2,428 additions, 0 deletions not shown because the diff is too large. Please use a local Git client to view these changes.

0 comments on commit ac679b5

Please sign in to comment.