-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance data export scripts #7
Comments
As a note, here's the quick script I ran to export the 'dwellings' data from a running mep-django installation. I will write this as a PR to mep-django as an export_dwellings.py command. # export_dwellings.py
# allow django models in use
import os,django,datetime as dt
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mep.settings")
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"
django.setup()
from mep.people.models import Person
from mep.common.utils import absolutize_url
def export_dwellings():
old=[]
for person in Person.objects.all():
print(person)
for account in person.account_set.all():
for addr in account.address_set.all():
loc = addr.location
odx=dict(
# Member
member_uri=absolutize_url(person.get_absolute_url()),
# IDs
person_id=person.id,
account_id=account.id,
address_id=addr.id,
location_id=loc.id,
# Address data
start_date = addr.start_date,
end_date = addr.end_date,
start_date_precision = addr.start_date_precision,
end_date_precision = addr.end_date_precision,
care_of_person_id = addr.care_of_person_id,
# Location data
street_address=loc.street_address,
city=loc.city,
postal_code=loc.postal_code,
latitude=loc.latitude,
longitude=loc.longitude,
country_id=loc.country_id,
arrrondissement=loc.arrondissement(),
)
old.append(odx)
now=dt.datetime.now()
ofn=f'dwellings.{now.year:02}-{now.month:02}-{now.day:02}.pkl'
import pickle
with open(ofn,'wb') as of:
pickle.dump(old, of)
if __name__=='__main__': export_dwellings() |
Note: The above code needs to be adapted on the model of export_members.py |
@quadrismegistus @jkotin some comments on the address data export enhancements: Here's the relevant GitHub issue in the mep-django codebase: I don't think we should introduce a new term ("dwellings"); I strongly recommend we continue to call these addresses for consistency with previous versions of the datasets. My plan is for exported address data to be packaged with the members data for a new version of the members dataset. We'll have to update the datapackage validation and dataset readme to document the fields in the addresses export and how the two files relate. We'll also need to clearly document this in the dataset change log. There will be redundancy with the main member export data, but I think we should keep that for backwards compatibility. I suggest the new export filename should be I still think it would be incredibly valuable to have a GeoJSON export of this data, because that would make the data usable with so many tools; that would require additional work, but there must be python packages that would help with this. (Maybe something in geodjango would be useful.). field-specific comments:
|
Just a brief note: this all sounds good to me. I agree about "addresses" over "dwellings," and having GeoJSON exports. I'm happy to write the readme files once the specific formats are determined. I can already picture the enhanced books dataset (with author nationalities and genders) but not the enhanced members with the addresses. What will the columns look like to show addresses at particular times? |
I'm going to close this as a duplicate, since the work is being tracked on Princeton-CDH/mep-django#791 and Princeton-CDH/mep-django#792 |
From Rebecca: There are necessary code updates to revise the data export scripts and then export and validate the datasets. We could export and publish new datasets with existing code, but it wouldn't include all the updates they've been working on for dates + addresses which is needed for the geographic analysis.
The text was updated successfully, but these errors were encountered: