Add config options to big scrape, separate scraping and writing within big scrape. #76

lfashby · 2019-10-25T00:04:54Z

(Sorry for the wall of text...)
These changes are meant to address issues #66, #67, #68 as well as a few suggestions made in the comments of pull request #61.

Here are the larger changes introduced by this pull request:

Separated the scraping and writing part of scrape.py (formerly scrape_and_write.py).
- write.py now generates the README table by inspecting the contents of the tsv/ directory. In the process it also creates a tsv readme_tsv.tsv with similar information as is in the README table.
Added no_stress, no_syllable_boundaries, and cut_off_date options to languages in languages.json.
- Modified codes.py to specify default values for these options when adding new languages to languages.json and to copy over previously set values for these options.
- cut_off_date should now be set in codes.py prior to running codes.py, I’ve updated the README in languages/wikipron with those instructions.
Added dialect config option (and require_dialect_label option) to English, Spanish and Portuguese.
- Restructured scrape.py to handle when one or more dialects are specified for a language. (Ran this new code on Portuguese, because it is a smaller language, to generate some sample data.)
- README table now includes dialect information in Wiktionary language name column

The only small changes worth noting are:

Logging in scrape.py will now also output to scraping.log which I’ve added to .gitignore. This way finding the languages that failed to be scraped is a bit easier (don’t need to scroll through the console). It also outputs the language dict from languages.json in the error message for languages that failed to be scraped within our set amount of retries, so it is a bit easier to build a temporary languages.json with the failed languages.
scrape.py will now remove files with less than 100 entries. (TSVs with less than 100 entries have been removed.)

I have a few questions regarding dialects that I'd like your thoughts on:

How dialects are handled in languages.json.
- I added dialects to languages.json in the following way:
```
"por": {
    ...
    "require_dialect_label": true,
    "dialect": {
        "_bz": "Brazil",
        "_po": "Portugal"
    },
    ...
},
```
- The keys (_bz, _po) in dialect serve as a sort of extension when naming the dialect tsv files (por_bz_phonetic.tsv, for example) and help with easy access to the dialect strings ("Brazil") in write.py. If you'd like me to change any of the keys because you'd like different extensions for certain dialects let me know. They can be longer than two letters. I'll provide links to the English, Spanish, and Portuguese entries in languages.json as a separate comment so you can review the keys and dialect strings I'm using.
- Is there a process for finding which dialects are frequently used within a given Wiktionary language category? (Aside from just checking entries and seeing whether dialect information is specified.)
How dialects are handled in scrape.py.
- As written scrape.py will first scrape a language entirely and then scrape for dialects if any are specified. This means it will scrape por (Portuguese) with no dialect, then por with "Brazil" as the dialect and then por with "Portugal" as the dialect. Is there any reason to scrape for por with no dialect when we are specifying a dialect? Do we want to keep the tsvs generated from previously scraping por (or eng/spa) with no dialect?
- Within scrape.py, I moved a lot of what was in main() to a separate function in order to handle dialects. There may well be a better way of handling dialects than the way I've tried to do it and I'm open to suggestions on how to improve it.

lfashby · 2019-10-25T00:15:19Z

These are what the English, Spanish and Portuguese entries look like in languages.json.

https://github.com/kylebgorman/wikipron/blob/faee11782214894fced4e370bc043b8792ac575a/languages/wikipron/src/languages.json#L362-L375

https://github.com/kylebgorman/wikipron/blob/faee11782214894fced4e370bc043b8792ac575a/languages/wikipron/src/languages.json#L1290-L1303

https://github.com/kylebgorman/wikipron/blob/faee11782214894fced4e370bc043b8792ac575a/languages/wikipron/src/languages.json#L1159-L1172

If you'd like, for example, the "_ca" in Spanish to be "_cs" or any other changes let me know.

lfashby · 2019-10-25T00:24:41Z

This is a link to the README table so you can see how Portuguese looks with dialect info in the Wiktionary name column. I don't think it would make sense to add another column for dialect info as the table is already quite wide and it would only apply to a few languages.

This is a link to the tsv of the README table. It contains mostly the same information as is in the README, but just lists the file name in the link column instead of a relative path in a markdown link. I could potentially add headers if that is useful. A separate column for dialects may be relevant in this format.

kylebgorman · 2019-10-25T00:45:31Z

Just wanted to comment on some of the API choices before I dive into code. This is a lot of stuff at once, congrats on getting it all together so fast.

I don't care for the name readme_tsv.tsv; it duplicates "tsv" (duh) and the fact that it is the same data as the README table is irrelevant to me. What is relevant to me as a reader is that it's a TSV file that contains the counts of data for all the languages. (languages.tsv maybe?)
Can you rename write.py to something more informative? Maybe summarize.py or generate_summary.py since that's what it does?
I personally have no use for the "all-dialectal" versions of English, Portuguese, and Spanish, and if someone wants them, they should just concatenate (and sort) the dialectal ones, right? So I'd suggest that if dialect is specified it should pre-empt an all-dialects scrape. (If someone really wanted one they could add THAT to the dialect specification in languages.json.)
Your approach to adding dialect suffixes is fine but I would suggest that your code should add the _ between the language and the dialect code, rather than including it in the JSON file, which feels like an abstractional leak. So it would look like

"dialect": {
"us": "US | General American",
"uk": "UK | Received Pronunciation"
},

but would still produce en_us_phonemic.tsv etc.

kylebgorman · 2019-10-25T00:50:17Z

languages/wikipron/src/codes.py

+        "casefold": None,
+        "no_stress": True,
+        "no_syllable_boundaries": True,
+        "cut_off_date": CUT_OFF_DATE


apply reflow-er? I see there's no trailing comma here ;)

kylebgorman · 2019-10-25T00:50:28Z

languages/wikipron/src/codes.py

    with open("languages.json", "r") as source:
        prev_languages = json.load(source)
    with open("iso-639-3_20190408.tsv", "r") as source:
        iso_list = csv.reader(source, delimiter="\t")
-        for (lang_page_title, total_pages) in _cat_members():
+        for (lang_page_title) in _cat_members():


remove extraneous parens here (or I think reflow-er will take care of that)

kylebgorman · 2019-10-25T00:51:41Z

languages/wikipron/src/codes.py

+# Set CUT_OFF_DATE below.
+# CUT_OFF_DATE can be no later than the date
+# at which you plan to begin scraping.
+CUT_OFF_DATE = "2019-11-01"


I'm a little confused why this is given as a default here. Why not use datetime.datetime.today().isoformat() or something? (For our particular case where we really want to do a cut-off-date'd scrape we can just search-and-replace on languages.json, right?)

For some reason I interpreted #68 as “Language entries in languages.json should contains all config options that do not use default values set in config.py.”

So I guess I wasted some time doing that. It would be simpler if the language entries looked like they were before:

"est": { "iso639_name": "Estonian", "wiktionary_name": "Estonian", "wiktionary_code": "et", "casefold": true },

Or this if a dialect is used:

"eng": { "iso639_name": "English", "wiktionary_name": "English", "wiktionary_code": "en", "casefold": true, "dialect": { "us": "US | General American", "uk": "UK | Received Pronunciation" } },

And then config_settings in scrape.py would handle setting no_stress, no_syllable_boundaries, and cut_off_date, since they are always the same. Then we can just set cut_off_date with datetime in scrape.py.

kylebgorman · 2019-10-25T00:53:55Z

languages/wikipron/src/codes.py

@@ -21,6 +21,7 @@
 * Dialect information may also need to be added manually
 """

+# TODO Generate and use a lookup table for the iso639-3 tsv file


Generic minor comment throughout this change: TSV when you're talking about it in a comment (okay to have a variable name like tsv cuz snake_casing conventions).

kylebgorman · 2019-10-25T00:56:13Z

languages/wikipron/src/scrape.py

+    # Removes TSV files with less than 100 entries.
+    if phonemic_count < 100:
+        logging.info(
+            '"%s", has less than 100 entries in phonemic transcription.',


This is going to say:

Amharic, has less than 100 entries...

Is the comma there intended?

kylebgorman · 2019-10-25T00:57:29Z

languages/wikipron/src/scrape.py



 if __name__ == "__main__":
    logging.basicConfig(
        format="%(filename)s %(levelname)s: %(asctime)s - %(message)s",
+        handlers=[
+            logging.FileHandler("scraping.log", mode="w"),


possibly put filename path here into a module-level constant

kylebgorman · 2019-10-25T00:57:51Z

languages/wikipron/src/write.py

+
+
+def _readme_insert(wiki_name, row_string):
+    with open("../tsv/README.md", "r+") as source:


suggest this path as a module-level constant

kylebgorman · 2019-10-25T01:00:06Z

languages/wikipron/src/write.py

+            # Rewrite the entire README.
+            source.seek(0)
+            source.truncate()
+            source.write("".join(readme_list))


If each element in readme_list was a line w/o newlines you could use the more-efficient writelines (which takes an iterator of strings and doesn't build up the whole big string at once). But this part is so complex I'm afraid to ask you to kick the hornet's nest...

I'll look into this.

kylebgorman · 2019-10-25T01:01:06Z

languages/wikipron/src/write.py

+                file.rindex("_") + 1 : file.index(".")
+            ].capitalize()
+            wiki_name = languages[iso639_code]["wiktionary_name"]
+            # Assumes we will not remove eng and spa tsv files


Is this comment still current? Will it become irrelevant shortly? If so remove.

kylebgorman · 2019-10-25T01:03:37Z

languages/wikipron/src/write.py

+        def sorting(ele):
+            return ele[3]
+        readme_tsv_list.sort(key=sorting)
+        for lang_row in readme_tsv_list:


Strongly suggest you use csv.writer for this (you just need to tell it you're using a "\t" separator). Then call it's writerows method with readme_tsv_list as an argument and you're free of this loop too!

kylebgorman

Reviews scattered throughout...only serious concerns are about file naming etc. I'm traveling this weekend so ping on email if you need timely attention. Things looking quite good overall.

lfashby · 2019-10-25T02:57:17Z

Okay, I'll start working on this tomorrow and just reply to a few comments now.

lfashby · 2019-10-25T03:12:30Z

In addition to changing the name of readme_tsv.tsv I think I should also change where it gets outputted. Right now it just gets placed in the src directory, but I presume the tsv directory is the more appropriate place.

kylebgorman · 2019-10-25T11:42:09Z

Not sure. Maybe the parent directory of src/ and tsv/ ? That way there’s no risk of it being confused with an actual language file. K

…

On Thu, Oct 24, 2019 at 11:12 PM Lucas Ashby ***@***.***> wrote: In addition to changing the name of readme_tsv.tsv I think I should also change where it gets outputted. Right now it just gets placed in the src directory, but I presume the tsv directory is the more appropriate place. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/kylebgorman/wikipron/pull/76?email_source=notifications&email_token=AABG4OODNAGTVWX5L3BSDFDQQJP2DA5CNFSM4JE44HH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECHBMEQ#issuecomment-546182674>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OMJZSK6JAITHWZN2J3QQJP2DANCNFSM4JE44HHQ> .

… write.py, added tsv of readme creation

….py to handle new languages.json format

… 100 entries, changed logging in scrape.py

…age names from SIL website, re-ran codes.py and write.py, added additional steps to README explaining src python files

…file created in scrape.py

…reformatted scrape.py, removed needless config options from codes.py and languages.json

lfashby · 2019-10-25T19:42:20Z

I think these changes touch on all the comments Kyle made. Aside from formatting and file name stuff, the big differences are:

I renamed and reworked write.py (now generate_summary.py) to be more efficient. Formerly it would rewrite the README (by reading it in, splitting the rows into a list of lists, inserting a new row, converting the list of lists into a big string and writing that string) every time we iterated through a file in the tsv directory. Now it just constructs a list of lists once, converts those internal lists to strings once, and writes an all new README once. It's perhaps still not ideal, but I feel better about it.
I added what I thought were important paths as constants to codes.py. (Not sure if I'm using module level constants correctly.)
I removed English, Spanish, and Portuguese TSVs that don't use a dialect (in preparation for the next scrape).

kylebgorman

Couple minor things left, but looking good. @jacksonllee, do you have any final thoughts on this?

kylebgorman · 2019-10-25T21:38:49Z

languages/wikipron/src/generate_summary.py

+
+    # Sort by wiktionary language name,
+    # with phonemic entries before phonetic
+    def sorting(ele):


still just use a lambda here.

As in:

languages_summary_list.sort(key=lambda ele: ele[3] + ele[5]) readme_list.sort(key=lambda ele: ele[3] + ele[5])

Or something along the lines of:

f = lambda ele: ele[3] + ele[5] languages_summary_list.sort(key=f) readme_list.sort(key=f)

The later is 'DRY-er' but isn't assigning a lambda to a variable an anti-pattern?

kylebgorman · 2019-10-25T21:43:21Z

languages/wikipron/src/scrape.py

            os.remove(phonemic_path)
+        if os.path.exists(phonetic_path):


Just noticed this. This is a "dark pattern" because it is theoretically possible for the file to blink in or out of existence between the check for existence and the remove. The recommended pattern is instead to do a try: os.remove(...) and then catch and ignore the exception (OSError probably) if it fails. I realize this is an annoying one...

Found this from googling:

from contextlib import suppress with suppress(OSError): os.remove(filename)

Seems nicer than ignoring the exception. Let me know if you have a preference. Is it the os.path.exists() check that makes the file potentially blink in or out of existence? Do I need to add this to all os.remove() statements or just the ones that involve os.path.exists()

kylebgorman · 2019-10-25T21:43:34Z

languages/wikipron/src/scrape.py

-            phonemic_readme_row_string = (
-                "| " + " | ".join(phonemic_row) + " |\n"
+        return
+    # Remove files for languages that returned nothing


period for consistency on this comment I suppose

kylebgorman · 2019-10-25T21:43:53Z

languages/wikipron/src/scrape.py

-                "| " + " | ".join(phonemic_row) + " |\n"
+        return
+    # Remove files for languages that returned nothing
+    elif phonemic_count == 0 and phonetic_count == 0:


probably if not phonemic_count and not phonetic_count or something like that.

kylebgorman · 2019-10-25T22:13:26Z

Former is better and hopefully the reflower can make it pretty too.

…

On Fri, Oct 25, 2019 at 6:06 PM Lucas Ashby ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In languages/wikipron/src/generate_summary.py <https://github.com/kylebgorman/wikipron/pull/76#discussion_r339255133>: > + + row = [ + iso639_code, + languages[iso639_code]["iso639_name"], + wiki_name, + str(languages[iso639_code]["casefold"]), + transcription_level, + str(num_of_entries), + ] + # TSV and README have different first column. + languages_summary_list.append([file] + row) + readme_list.append([f"[TSV]({file})"] + row) + + # Sort by wiktionary language name, + # with phonemic entries before phonetic + def sorting(ele): As in: languages_summary_list.sort(key=lambda ele: ele[3] + ele[5]) readme_list.sort(key=lambda ele: ele[3] + ele[5]) Or something along the lines of: f = lambda ele: ele[3] + ele[5] languages_summary_list.sort(key=f) readme_list.sort(key=f) The later is 'DRY-er' but isn't assigning a lambda to a variable an anti-pattern? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/kylebgorman/wikipron/pull/76?email_source=notifications&email_token=AABG4OO4CCTKSXRH4HM4I5LQQNUWHA5CNFSM4JE44HH2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCJJ2WVI#discussion_r339255133>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OP2FUYEI4PUWANXOZDQQNUWHANCNFSM4JE44HHQ> .

lfashby · 2019-10-25T22:24:49Z

I'm not sure what reflower or reflow-er is and searching 'reflower python' doesn't seem to bring up anything useful.

kylebgorman · 2019-10-26T03:54:08Z

Sorry, I’m referring to ‘white’.

…

On Fri, Oct 25, 2019 at 6:24 PM Lucas Ashby ***@***.***> wrote: I'm not sure what reflower or reflow-er is and searching 'reflower python' doesn't seem to bring up anything useful. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/kylebgorman/wikipron/pull/76?email_source=notifications&email_token=AABG4OMEHVGLQ73H36WCWBLQQNW3DA5CNFSM4JE44HH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJWZMQ#issuecomment-546532530>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OJLQ6YLZ4RRIR6JZZ3QQNW3DANCNFSM4JE44HHQ> .

… in generate_summary.py, reworked removing files in scrape.py

lfashby · 2019-10-26T19:52:58Z

I made the above changes and ran white. There were a few things I could do with the os.remove try-except block but I just went with the most straightforward method and added some comments above it.

jacksonllee

Just a few minor things (plus a new, separate bug unrelated to this PR).

jacksonllee · 2019-10-27T04:16:51Z

languages/wikipron/README.md

-Wiktionary languages with over 100 entries. It writes the results of those
-scrape calls to TSVs and generates a [README](./tsv/README.md) with selected
-information regarding the contents of those TSVs and the configuration settings
+Wiktionary languages with over 100 entries. [write.py](./src/write.py)


write.py -> generate_summary.py?

jacksonllee · 2019-10-27T04:18:22Z

languages/wikipron/README.md

 2.  Run [scrape.py](./src/scrape.py).
+3.  Run [write.py](./src/write.py).


generate_summary.py

jacksonllee · 2019-10-27T04:26:54Z

languages/wikipron/src/generate_summary.py

+    readme_list = []
+    languages_summary_list = []
+    path = "../tsv"
+    for file in os.listdir(path):


file -> file_path or something?

jacksonllee · 2019-10-27T04:42:34Z

languages/wikipron/tsv/por_bz_phonetic.tsv

+adaga	a d a ɡ a
+adenina	a d e n i n a
+adeus	a d e w s
+adição	a d͡ ʒ i s ɐ̃ w


Oops, we forgot about having to correctly handle the tie bar for affricates. I'm opening an issue for this now. (Edit: Just opened #78)

lfashby · 2019-10-27T15:21:06Z

boolean coercion here?

You mean like if not phonemic_count and not phonetic count as in the if block below it?

I could potentially set up _call_scrape() so it always returns an int and then remove the files when that int is less than 100, but the idea behind having these separate if blocks and the reason _call_scrape() can return either an int or None is so we can log the different ways languages can fail.

kylebgorman · 2019-10-27T19:50:45Z

I don't know if this is a good case for a returning heterogeneous types. If you want to do that any of the following would be preferable IMO: 1. raise an exception 2. return 0 as a default 3. return -1 if it's important to distinguish between that and 0 I think #2 is probably the best option.

…

On Sun, Oct 27, 2019 at 11:21 AM Lucas Ashby ***@***.***> wrote: boolean coercion here? You mean like if not phonemic_count and not phonetic count as in the if block below it? I could potentially set up _call_scrape() so it always returns an int and then remove the files when that int is less than 100, but the idea behind having these separate if blocks and the reason _call_scrape() can return either an int or None is so we can log the different ways languages can fail. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/kylebgorman/wikipron/pull/76?email_source=notifications&email_token=AABG4OPT2HCZSGQSSDUG3ALQQWWWFA5CNFSM4JE44HH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLAXQA#issuecomment-546704320>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OKCAGFYCPB5RVHTR6TQQWWWFANCNFSM4JE44HHQ> .

kylebgorman · 2019-10-28T17:07:27Z

languages/wikipron/README.md

 that were passed to scrape. [languages.json](./src/languages.json) provides
 [scrape.py](./src/scrape.py) with a dictionary containing the information it
 needs to call scrape on all Wiktionary languages with over 100 entries as well
 as to generate the previously mentioned [README](./tsv/README.md).
 [codes.py](./src/codes.py) is used to generate
 [languages.json](./src/languages.json). It queries Wiktionary for all languages
 with over 100 entries. It also outputs
-[failed\_langauges.json](./src/failed_languages.json), a list of languages on
+[unmatched\_langauges.json](./src/unmatched_languages.json), a list of languages on


langauges

kylebgorman · 2019-10-28T17:08:36Z

languages/wikipron/src/codes.py

-    failed_languages = {}
-    with open("languages.json", "r") as source:
+    unmatched_languages = {}
+    with open(LANGUAGES_PATH, "r") as source:
        prev_languages = json.load(source)
    with open("iso-639-3_20190408.tsv", "r") as source:


Model level constant?

kylebgorman · 2019-10-28T17:08:58Z

languages/wikipron/src/codes.py

-        failed_dict = json.dumps(failed_languages, indent=4)
-        failed.write(failed_dict)
+    # All languages that failed to be matched with data in ISO 639 TSV file.
+    with open("unmatched_languages.json", "w") as unmatched:


module-level constant

kylebgorman · 2019-10-28T17:09:55Z

languages/wikipron/src/generate_summary.py

+        languages_summary_list.append([file_path] + row)
+        readme_list.append([f"[TSV]({file_path})"] + row)
+
+    # Sort by wiktionary language name,


Wiktionary

kylebgorman · 2019-10-28T17:10:56Z

languages/wikipron/src/generate_summary.py

+
+    # Sort by wiktionary language name,
+    # with phonemic entries before phonetic.
+    languages_summary_list.sort(key=lambda ele: ele[3] + ele[5])


Okay, so now you're using this twice, so probably define it outside of main as _3_and_5 or something like that?

kylebgorman · 2019-10-28T17:12:23Z

languages/wikipron/src/scrape.py

+    return 0
+
+
+def build_config_and_filter_scrape_results(


maybe too long, also maybe you want _ at the beginning

…s, fixed typos

kylebgorman · 2019-10-28T18:29:45Z

Other than that, looks good to me.

kylebgorman

LGTM. please use squash-and-merge.

jacksonllee mentioned this pull request Oct 25, 2019

Remove "require_dialect_label" #77

Merged

kylebgorman reviewed Oct 25, 2019

View reviewed changes

lfashby added 9 commits October 25, 2019 09:47

Updated languages.json with dialects, moved writing from scrape.py to…

1e35898

… write.py, added tsv of readme creation

Added more config options to languages.json, rewrote and tested codes…

bbf42d4

….py to handle new languages.json format

Removed page_entries from languages.json, removed tsvs with less than…

72bee65

… 100 entries, changed logging in scrape.py

Removed small language rows from readme, first pass on re-writing scrape

d7467f4

Modified write.py to handle dialects, updated iso tsv file with langu…

614d9ce

…age names from SIL website, re-ran codes.py and write.py, added additional steps to README explaining src python files

Tested dialects with Portuguese, updated gitingore to ignore logging …

6ac1f52

…file created in scrape.py

Added comments and minor formatting to core src python files

fe217fb

Updated README describing src python files

779ec97

Removed eng/spa/por (no dialect) tsvs, renamed and rewrote write.py, …

ac23a48

…reformatted scrape.py, removed needless config options from codes.py and languages.json

lfashby force-pushed the config_options branch from faee117 to ac23a48 Compare October 25, 2019 19:24

kylebgorman reviewed Oct 25, 2019

View reviewed changes

Ran white on codes.py/scrape.py/generate_summary.py, reworked sorting…

2d43a5b

… in generate_summary.py, reworked removing files in scrape.py

jacksonllee reviewed Oct 27, 2019

View reviewed changes

jacksonllee mentioned this pull request Oct 27, 2019

Handle the tie bar for affricates in IPA segmentation #78

Closed

Reworked removing tsvs in scrape.py

ecd41ff

kylebgorman reviewed Oct 28, 2019

View reviewed changes

Renamed function in scrape.py, add all paths as module level constant…

04c8d99

…s, fixed typos

Reworked iterating through dialects in scrape.py

2ac3187

kylebgorman self-requested a review October 28, 2019 20:10

kylebgorman approved these changes Oct 28, 2019

View reviewed changes

lfashby merged commit 8010798 into CUNY-CL:master Oct 29, 2019

lfashby deleted the config_options branch October 29, 2019 00:07



		def _readme_insert(wiki_name, row_string):
		with open("../tsv/README.md", "r+") as source:

		2. Run [scrape.py](./src/scrape.py).
		3. Run [write.py](./src/write.py).

Add config options to big scrape, separate scraping and writing within big scrape. #76

Add config options to big scrape, separate scraping and writing within big scrape. #76

Conversation

lfashby commented Oct 25, 2019 • edited

lfashby commented Oct 25, 2019

lfashby commented Oct 25, 2019

kylebgorman commented Oct 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylebgorman Oct 25, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylebgorman left a comment

Choose a reason for hiding this comment

lfashby commented Oct 25, 2019

lfashby commented Oct 25, 2019

kylebgorman commented Oct 25, 2019 via email

lfashby commented Oct 25, 2019 • edited

kylebgorman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylebgorman commented Oct 25, 2019 via email

lfashby commented Oct 25, 2019

kylebgorman commented Oct 26, 2019 via email

lfashby commented Oct 26, 2019

jacksonllee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacksonllee Oct 27, 2019 • edited

Choose a reason for hiding this comment

lfashby commented Oct 27, 2019

kylebgorman commented Oct 27, 2019 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylebgorman commented Oct 28, 2019

kylebgorman left a comment

Choose a reason for hiding this comment

lfashby commented Oct 25, 2019 •

edited

kylebgorman Oct 25, 2019 •

edited

lfashby commented Oct 25, 2019 •

edited

jacksonllee Oct 27, 2019 •

edited