Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTM important dates scraper #81

Merged
merged 9 commits into from
May 15, 2016
Merged

Conversation

anderson202
Copy link
Contributor

Implemented UTM important dates scraper

Anderson Ng Ho Yin added 6 commits May 12, 2016 15:48
added initial scraping code
This reverts commit 13fdda0.
Added basic scraping code
the scraper and now functional, but the JSON file names may have to be
changed
fixed bug and added more comments
Update date format to match ISO 8601 format
@anderson202 anderson202 changed the title Utmdates Added UTM important dates scraper May 12, 2016
@anderson202 anderson202 changed the title Added UTM important dates scraper Add UTM important dates scraper May 12, 2016
@g3wanghc
Copy link
Member

Can you squash your first 3 commits and capitalize the message? 😬

def scrape(location='.'):
def scrape(location='.', year=None): #scrapes most current sessions by default

year = year or now.year
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call datetime.datetime.now() here instead if it's needed.

@kashav
Copy link
Member

kashav commented May 13, 2016

  • The two date values (date / end_date) should also be in ISO 8601 format

  • You can use datetime to find the month (instead of the dictionary you're currently using)

    >>> datetime.datetime.strptime('February', '%B').strftime('%m')
    '02'
  • I think it'd be better to introduce a new scraper class, Dates for these important dates scrapers, the UTM one would be called UTMDates. I think Calendar was meant for the academic calendars (here's the UTM one for example).

@anderson202
Copy link
Contributor Author

Thanks, I'll try and implement the fixes.

Btw, how do I make changes to the commit messages and "squash" the first 3? Still learning how to use git sorry about that.

@qasim
Copy link
Member

qasim commented May 13, 2016

@anderson202 don't worry about that, I can squash them when I merge this to master :) (I don't actually know how to do it, but GitHub has a nice button for me when I choose "Merge pull request")

I'll try to review this tomorrow as well.

@arkon
Copy link
Contributor

arkon commented May 13, 2016

@kshvmdn +1 to the Dates class. That can wait for a separate PR though.

@g3wanghc
Copy link
Member

git rebase -i head~5

Removed unnecessary lines and implemented suggested fixes
@anderson202
Copy link
Contributor Author

anderson202 commented May 13, 2016

I've implemented a few of the suggested fixes. Not sure if some of the other suggestions was necessary, please let me know if they are or if any more changes are required.

def scrape(location='.'):
def scrape(location='.', year=None): #scrapes most current sessions by default

year = year or datetime.datetime.now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

year = year or datetime.datetime.now().year

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshvmdn sorry about that, fixed now thanks.

@arkon
Copy link
Contributor

arkon commented May 13, 2016

Looks pretty good! There could be a few formatting changes to better follow PEP8, but that can be done at a later point.

@arkon
Copy link
Contributor

arkon commented May 13, 2016

One thing that should be changed though is moving it to a dates/ subdirectory since calendar/ is meant for course calendars.

@kashav
Copy link
Member

kashav commented May 14, 2016

Finally got the chance to try this - it looks good! I found some other small stuff that we may be able to improve on:

  • session - Add year and capitalize

    • So "Summer" becomes "2016 SUMMER" (using the year from date)
    • And "Fall/Winter" becomes "2016 FALL/WINTER" or "2017 FALL/WINTER" (also using the year from date)
  • I'm looking at the dates and we might see some overlaps between the summer and winter terms, I think the best solution here is to move the session key into the events list, so the schema becomes:

    "date":String,
    "events":[{
      "end_date":String,
      "session":String,
      "campus":String,
      "description":String
    }]

Other than that, I think this is really good!

@anderson202
Copy link
Contributor Author

Got it, will get them done ASAP.

@qasim
Copy link
Member

qasim commented May 15, 2016

@kshvmdn good to merge?

@kashav
Copy link
Member

kashav commented May 15, 2016

@qasim Besides the name/directory thing, it's good to go!

We can fix that in a future PR when adding date scrapers for the other campuses! 👍

@qasim
Copy link
Member

qasim commented May 15, 2016

@kshvmdn that's fine then.

@anderson202 thanks a lot for contributing. It's the work that people like you put into this that really pushes the project forward.

@qasim qasim merged commit 2064067 into cobalt-uoft:master May 15, 2016
@anderson202 anderson202 deleted the utmdates branch May 16, 2016 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants