Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add seed data and instructions #80

Merged
merged 9 commits into from
Mar 28, 2024
Merged

Add seed data and instructions #80

merged 9 commits into from
Mar 28, 2024

Conversation

russtoku
Copy link
Member

Preliminary information to seed a development database. This should be enough to anyone started and provide a base to move forward from.

I'm still working on a Python script to create a category.json fixture and CSV file for uploading public body data from the CSV extract from the production UIPA.org website.

@russtoku russtoku mentioned this pull request Mar 21, 2024
2 tasks
data/seed/Seeding.md Outdated Show resolved Hide resolved
@yenhtran
Copy link

Thank you @russtoku for the great instructions!

yenhtran
yenhtran previously approved these changes Mar 22, 2024
data/seed/Seeding.md Outdated Show resolved Hide resolved
data/seed/Seeding.md Outdated Show resolved Hide resolved
@russtoku
Copy link
Member Author

Apologies for the large number of changes:

  • Re-write of Seeding.md
  • Renamed the fixture files with a timestamp
  • Added some utility programs to help with preparing a public body CSV file that can be uploaded via the Admin website.

@russtoku
Copy link
Member Author

These are working now:

  • Load the classification, jurisdiction, and foilaw fixtures.
  • Load the test categories and test public bodies.
  • Load the full list of categories from the 03-15-2024 extract of public bodies from UIPA.org.
  • Only 118 out of 201 public bodies can be loaded from the fixed 03-15-2024 extract of public bodies from UIPA.org. This should be enough to work with.
  • Two shell scripts can be used to "reset" the database and search engine indexes.

I'd like to suggest that we merge things at this point to give people something to work with. The seeding of all 201 public bodies will take a few more days. If I finish that before this is merged, I'll add it.

the public body information from a CSV file.

```
$ python manage.py loaddata data/seed/2024-03-15-classification.json
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These filenames need to be updated


- Then, on the `Public Body` page (Home > Public Body > Public Bodies), scroll
down to the bottom of the page to where there is a `Choose File` button next
to the `Import Public Bodies` button.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried running this?

python manage.py import_csv data/seed/2024-03-24-public-bodies-fixed.csv

Docs: https://froide.readthedocs.io/en/latest/importpublicbodies/#importing-via-command-line

I gave it a quick shot, and it seemed to work:

➜  uipa git:(pr/80) python manage.py import_csv data/seed/test-public-bodies.csv
/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django_elasticsearch_dsl/documents.py:178: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security.
  response = bulk(client=self._get_connection(), actions=actions, **kwargs)
Import done.

tyliec
tyliec previously approved these changes Mar 26, 2024
Copy link
Sponsor Member

@tyliec tyliec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nitpick commands, but with this PR I was able to seed the portal with all the public bodies! 🚀

```
$ python manage.py loaddata data/seed/2024-03-24-categories.json
```
- On the Public Bodies page of the Admin website, upload the
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly the previous import_csv command fails here:

➜  uipa git:(pr/80) python manage.py import_csv data/seed/2024-03-24-public-bodies-fixed.csv
/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django_elasticsearch_dsl/documents.py:178: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security.
  response = bulk(client=self._get_connection(), actions=actions, **kwargs)
Traceback (most recent call last):
  File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/manage.py", line 13, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
  File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/src/froide/froide/publicbody/management/commands/import_csv.py", line 25, in handle
    importer.import_from_file(f)
  File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/src/froide/froide/publicbody/csv_import.py", line 47, in import_from_file
    self.import_row(row)
  File "/Users/tylerchong/Desktop/workspace/codewithaloha/uipa/src/froide/froide/publicbody/csv_import.py", line 90, in import_row
    row["parent"] = PublicBody._default_manager.get(slug=slugify(parent))
  File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/tylerchong/.pyenv/versions/3.10.13/lib/python3.10/site-packages/django/db/models/query.py", line 637, in get
    raise self.model.DoesNotExist(
froide.publicbody.models.PublicBody.DoesNotExist: PublicBody matching query does not exist.

but going through the Web UI with the same file seems to work, so 🤷

Screenshot 2024-03-25 at 9 31 41 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure why it does that. I get the *PublicBody matching query does not exist." message when I upload via the Admin website. I get 113 public bodies loaded but the CSV file has 201.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is saying that the parent of the public body being loaded doesn't exist in the Public Body table yet. So things stop at this point. It's a data issue. I'll need to clean up the data a bit.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - I'll merge in #81 then in the meantime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data/seed/2024-03-24-public-bodies-fixed.csv file appears to be good because after several attempts all 201 public bodies are loaded.

In my working directory, there appears to be something flaky going on with src/froide/froide/publicbody/csv_import.py in my virtual environment. If I add a few print statements all 201 public bodies will load. I'm using Python 3.10.13 in a virtual environment created using the venv module. When only 118 or so public bodies are loaded, it usually due to the name and the slug not matching up. This prevents public bodies with parent bodies to fail to load.

@russtoku
Copy link
Member Author

I can't find a permanent fix for loading public bodies from a CSV file. I've updated the Seeding.md document to point this out. Loading abou 113 public bodies should enough to start developing with.

I'd like to see more people getting involved so hopefully the seeding of the database helps make the set-up process easier.

@russtoku
Copy link
Member Author

I discovered that deleting and reloading the public bodies from a CSV file several times will eventually load all 201 public bodies correctly. I've this to a note in Seeding.md.

Copy link
Sponsor Member

@tyliec tyliec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-03-27 at 7 06 14 PM

🚀 I was able to load all 201 public bodies through the Django admin UI... eventually. No idea why it's flaky though 🤷

```
$ python extract_sets.py ../2024-03-15-Hawaii_UIPA_Public_Bodies_All.csv
```

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The formatting here is off:

Screenshot 2024-03-27 at 7 33 18 PM

@tyliec tyliec merged commit 39fad22 into CodeWithAloha:main Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants