Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a food scraper #15

Closed
qasim opened this issue Nov 30, 2015 · 9 comments
Closed

Implement a food scraper #15

qasim opened this issue Nov 30, 2015 · 9 comments
Assignees
Milestone

Comments

@qasim
Copy link
Member

qasim commented Nov 30, 2015

map.utoronto.ca has food information. We can scrape that information and form something like the following:

{
  id: String,
  building_id: String,
  name: String,
  description: String,
  tags: [String],
  image: String,
  campus: String,
  lat: Number,
  lng: Number,
  address: String,
  hours: {
    sunday: {
      open: Number,
      close: Number
    },
    monday: {
      open: Number,
      close: Number
    },
    tuesday: {
      open: Number,
      close: Number
    },
    wednesday: {
      open: Number,
      close: Number
    },
    thursday: {
      open: Number,
      close: Number
    },
    friday: {
      open: Number,
      close: Number
    },
    saturday: {
      open: Number,
      close: Number
    }
  }
}
@qasim
Copy link
Member Author

qasim commented Nov 30, 2015

Schema is open for opinions.

@kashav
Copy link
Member

kashav commented Mar 30, 2016

I've started working on this. Current schema is looking like:

{
    id: String,
    building_id: String,
    name: String,
    short_name: String,
    description: String,
    url: String,
    tags: [String],
    image: String,
    campus: String,
    lat: Number,
    lng: Number,
    address: String,
    hours: {
        sunday: {
            closed: Boolean,
            open: String,
            close: String
        },
        monday: {
            closed: Boolean,
            open: String,
            close: String
        },
        tuesday: {
            closed: Boolean,
            open: String,
            close: String
        },
        wednesday: {
            closed: Boolean,
            open: String,
            close: String
        },
        thursday: {
            closed: Boolean,
            open: String,
            close: String
        },
        friday: {
            closed: Boolean,
            open: String,
            close: String
        }
    }
}

Only problem with having open / close as numbers was that you had no way of indicating the period (AM/PM). Some days, the restaurant is closed, which is the reason for the closed boolean (open, close are empty strings in this case). I'm sure this could be simplified if need be.

@kashav
Copy link
Member

kashav commented Mar 30, 2016

Also, I ended up having to duplicate get_value from the buildings scraper, might be a good idea to add that to the superclass.

@qasim
Copy link
Member Author

qasim commented Mar 30, 2016

@kshvmdn looked over the class, this looks really good.

The courses JSON uses numbers for time, and it converts all time to be in 24-hour clock format (so a number between [0, 24), with for example 8AM being represented as 8, and 12:30 PM being represented as 12.5).

The initial motivation behind storing time in this format is that it allows for very low-friction querying over time (you can get time greater than or less than some other time by just comparing numbers).

@kashav
Copy link
Member

kashav commented Mar 30, 2016

@qasim ahhh such an obvious solution -- i'll work on getting that implemented

@arkon
Copy link
Contributor

arkon commented Mar 30, 2016

@qasim I'm wondering why you didn't go with a string formatted as hh:mm instead? Seems more readable.

@qasim
Copy link
Member Author

qasim commented Mar 30, 2016

@arkon The following is a snippet from the filter endpoint. Basically, with the number format, you can consider time values the same as numbers, and perform operations with them using MongoDB's $ne, $gt, $lt, $gte, and $lte built-in queries.

https://github.com/cobalt-uoft/cobalt/blob/master/src/api/courses/routes/filter.js#L294-L315

  if (['breadth', 'level', 'size', 'enrolment', 'start', 'end', 'duration'].indexOf(key) > -1) {
    // Integers and arrays of integers (mongo treats them the same)

    if (['size', 'enrolment', 'start', 'end', 'duration'].indexOf(key) > -1) {
      response.isMapReduce = true
      response.mapReduceData = part
    }

    if (part.operator === '-') {
      response.query[ABSOLUTE_KEYMAP[key]] = { $ne: part.value }
    } else if (part.operator === '>') {
      response.query[ABSOLUTE_KEYMAP[key]] = { $gt: part.value }
    } else if (part.operator === '<') {
      response.query[ABSOLUTE_KEYMAP[key]] = { $lt: part.value }
    } else if (part.operator === '>=') {
      response.query[ABSOLUTE_KEYMAP[key]] = { $gte: part.value }
    } else if (part.operator === '<=') {
      response.query[ABSOLUTE_KEYMAP[key]] = { $lte: part.value }
    } else {
      // Assume equality if no operator
      response.query[ABSOLUTE_KEYMAP[key]] = part.value
    }
  }

@kashav
Copy link
Member

kashav commented Mar 30, 2016

Added time conversion in kashav@8e163c2.

Had an odd case with this location, with (what seems to be) a mistyped Monday opening time. Not sure whether we should ignore that time or just keep the hacky solution that I'm currently using.

@qasim
Copy link
Member Author

qasim commented Mar 31, 2016

@kshvmdn let's stick with the hacky solution so we account for all the restaurants, do you want to email the map people about that typo? Then when they fix it we can change that. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants