-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introspection after ResBaz Sydney 2017 lesson #41
Comments
Thanks so much for this write up @jnothman It feels like there could be room for different web scraping lessons here - an 'intro to web scraping with tools' - focus on a tool, include introduction to HTML/CSS; and a more advanced lesson - possibly 'web scraping with Python'. I could see this being multiple episodes within a single lesson - but it would have to be clear that the intention wasn't to use all the episodes in one teaching session. (@drjwbaker has suggested a similar approach in the OpenRefine lesson to me previously) I feel that any tool introduced should follow the selectors we are teaching - so if we are teaching css selectors it seems odd then to use a tool that uses similar but different selector syntax. Was there any feedback from the participants in terms of how useful they found it and whether it met with their expectations? |
yes, having some optional components makes sense, but in any case a lesson
will be cropped to fit its schedule and audience when presenting it. One
question is whether there are alternatives (hard to maintain) or extensions
(still has its challenges).
I did not collect feedback in an organised manner but hope to get a list of
participant emails to ask for feedback after the fact. (In focusing on
developing the lesson I didn't prepare enough for that aspect.) I had one
strong "I'm struggling" response between CSS selectors and visual scraping.
I had the sense that most other people were following along well and asking
appropriate questions about the exercises, and I got a couple of positive
comments.
…On 4 Jul 2017 6:25 pm, "Owen Stephens" ***@***.***> wrote:
Thanks so much for this write up @jnothman <https://github.com/jnothman>
It feels like there could be room for different web scraping lessons here
- an 'intro to web scraping with tools' - focus on a tool, include
introduction to HTML/CSS; and a more advanced lesson - possibly 'web
scraping with Python'.
I could see this being multiple episodes within a single lesson - but it
would have to be clear that the intention wasn't to use all the episodes in
one teaching session. ***@***.*** <https://github.com/drjwbaker> has
suggested a similar approach in the OpenRefine lesson to me previously)
I feel that any tool introduced should follow the selectors we are
teaching - so if we are teaching css selectors it seems odd then to use a
tool that uses similar but different selector syntax.
Was there any feedback from the participants in terms of how useful they
found it and whether it met with their expectations?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#41 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wzmOgIGtSPhOGcUvTK-MUl8NLiwks5sKfb9gaJpZM4OMUdH>
.
|
Great workshop, and great summary @jnothman, I think you picked out all the keypoints. I will add that having attended with absolutely 0 web scraping experience I got a lot out of it! I agree that this could be broken down into basic (visual) and advanced (code based) lesson. The mechanics of how to do that best I would leave to you... :) The only things I would add are: b) maybe after introducing the concept of one or two selectors it would be good to jump straight into the visual scraper tool and try this out on the simple webpage. This could be followed up by the more in-depth discussion of various CSS selectors and the UNSC example. I think this would help cement the concept and break up the theoretical discussion at the start. Last point, personally I think the UNSC example is really good. The quirks of this site show how difficult good scraping could be. |
This afternoon, I had 3h (including 10 min break) to present web scraping. I presented from https://ctds-usyd.github.io/2017-07-03-resbaz-webscraping/. I am not a trained SWC instructor, and not used to the narrative format of SWC lessons. I am also an experienced software engineer, so while I am used to some amount of teaching, it was hard for me to recall how much ground work there is to this topic. In the context of ResBaz, I was presenting to a group of research students, librarians, ?academics, etc. from Sydney universities. I did not get anything in the way of a survey, but hope to ask the ResBaz organisers to email students for their comments.
There were about 22 students, though 40 had signed up. Despite the Library Carpentry resolutions of a few weeks ago to focus on coding scrapers, I had decided to make something accessible to non-coders. In the end, we did not cover the coding part at all. I don't think we suffered greatly for this.
What we managed to cover
We covered, perhaps, half the material:
Good points
Things deserving attention
Overall
There is far too much narrative before getting hands dirty. Even so, students seemed to appreciate the "what web scraping is not" at least to some extent. Could probably be moved to conclusion.
Students who were not well grounded in the structure of web pages struggled.
I had two projector screens. Even so, it is challenging to set up a visual projection that covers: the lesson, the page being scraped, source code or element inspector for a page being scraped, the scraping tool or code...
I think it would be good to focus on a visual scraper, but then have a number of scripts in several scraping frameworks and languages available as supplementary material to the lesson. A discussion of the nuances of coding these things by hand can be left brief, or available with more description for an extended lesson.
I feel that visual scrapers are a good way to demonstrate what we're up to with little coding competence required, and are in practice a useful technology to grok.
The key thing we need to consider is to what extent we make this available with a "choose your own adventure: CSS vs XPath; visual vs requests/lxml vs scrapy" approach, or as a single well-honed curriculum that works for most people.
CSS selectors
<catfood>
example is poorer for only having one of each tag name.Visual scraping
:nth-of-type
which refers to tag name.href
, and spoke of machine-readable publication dates (with microdata) in news sites. Also could have mentioneda
'stitle
attr. What else? Worth writing a paragraph on in the lesson, perhaps.I'll offer my lessons across to this repo shortly.
Anything to add, @nikzadb, @Anushi, @RichardPBerry?
The text was updated successfully, but these errors were encountered: