Skip to content

Named Entity Recognition + Valence Tests#226

Merged
xehu merged 0 commit intoinitial_package_versionfrom
shruti/ner-valence-test
Aug 7, 2024
Merged

Named Entity Recognition + Valence Tests#226
xehu merged 0 commit intoinitial_package_versionfrom
shruti/ner-valence-test

Conversation

@agshruti12
Copy link
Copy Markdown
Contributor

@agshruti12 agshruti12 commented Jun 10, 2024

Pull Request Template:
If you are merging in a feature or other major change, use this template to check your pull request!

Basic Info

What's this pull request about?

Tests for NER and valence. NER tests are a modified unit test, checking correct NUMBER of named entities, along with correct named entities themselves. Has it's own test dataset called 'test_named_entity.csv'. Performed valence testing as well using perturbation framework. For a given positive/negative sentence, have generated a corresponding INV and DIR perturbed sentence. Test checks to see if the magnitude of the DIR change is greater than the magnitude of the INV change. Has it's own test dataset called test_chat_level_complex.csv. We can input more tests that operated with the perturbation framework at the chat level in this csv.

Feature Documentation

Did you document your feature? Make sure you do the following before you pull request!

  • Copy the Template. Go to the Feature Wiki and Copy/Paste the Feature Template into a new page.
  • Fill out the Template. Fill out the basic information for the feature in the template. Use the template to document your plan for implementation and major design decisions; if anything changes along the way, update the documentation as you go.
  • At the top of my feature, I indicate whether the feature is conversation level or chat level.

Code Basics

  • My feature is a .py file.
  • My feature uses snake case in the name. That means the name of the format is my_feature, NOT myFeature (camel case).
  • My feature has the name, NAME_features.py, where NAME is the name of my feature.
  • My feature is located in feature_engine/features.

Testing

  • I have thought about test cases for my features, with inputs and expected outputs.
  • I have linked to a location (e.g., .py or .ipynb) where I can run my test cases and show they work (inputs match expected outputs).

The location of my tests are here:

[PASTE LINK HERE]

If you check all the boxes above, then you ready to merge!

@xehu
Copy link
Copy Markdown
Collaborator

xehu commented Jun 17, 2024

I notice that not all the NER tests pass --- but this is because the feature isn't perfect! Would it be possible to run the feature on the full test dataset in order to get metrics (e.g., precision/recall), but then only run the test on a subset of the NER features that we know are supposed to work? That way, we won't have all the tests return as 'failing' ...

@agshruti12 agshruti12 force-pushed the shruti/ner-valence-test branch from cd9da03 to 4d91a21 Compare July 26, 2024 12:41
@agshruti12
Copy link
Copy Markdown
Contributor Author

Changes made:

  1. assign_chunk_nums: removed temporal parameters
  2. get_all_DD_features: removed temporal parameters from assign_chunk_nums call
  3. burstiness: use wait_times starting at positions 1 (omitted first value), and ensuring time_diff parsing works for both timedelta (when passing in a datetime timstamp) and float (when passing in a unixtime or integer timestamp) objects.
  4. fflow: cosine similarity --> cosine distance
  5. tests for valence + NER + discursive diversity + variance in DD + incongruent modulation + within person discursive range + forward flow + team burstiness (across 3 different timestamps)

@xehu xehu changed the base branch from main to initial_package_version August 7, 2024 21:10
@xehu xehu merged this pull request into initial_package_version Aug 7, 2024
@xehu xehu deleted the shruti/ner-valence-test branch August 7, 2024 21:35
xehu added a commit that referenced this pull request Aug 7, 2024
* add pyproject.toml

* Update README.md with team-comm-tools rather than team-process-map

* Update README.md to remove outdated requirement (#264)

* delete junit

* move preprocessing notebooks to tests

* remove unnecessary deps

* update requirements

* more slimming of reqs

* remove packaging related deps

* get rid of requests and xgboost deps

* addressing #267

* edit src paths

* test with python 3.7

* test  python 3.7

* test 3.8

* test 3.8 pt. 2

* test 3.9

* test 3.10

* restore to 3.11; play with docs

* Update README.md with new path names.

* standardize package structure and solve path issues

* update requirement.txt path in workflow

* update workflow file

* updating test workflow

* update test workflow

* update test workflow

* update test workflow

* update test workflow

* update imports in example

* fix bugs

* move lexiconx_dict.pkl to features/assets

* update dependencies

* delete legacy files and remove constant nltk import

* clean up package structure and warnings

* resolve relative imports issue in sphinx

* create single installation script

* commit setup script

* update setup script and documentationZ

* update README to point to setup script

* add linkes to website and Rtd to readme

* disable tokenizer parallelism to avoid error

* add badges to home page

* Named Entity Recognition + Valence Tests (#226)

* valence testing

* rearranging files

* intermediate ner testing

* NER testing

* fix featurizer

* fix featurize bug

* updating test dataset + function

* code coverage

* burstiness

* move testing FB's into run_tests.py

* move NER dataframe to test file

* adding complex tests back to run_tests.py

* add chat_complex_df and conv_complex_df to run_tests.py

* correct dataset paths

* rebase

* changing references as part of rebase

* correcting FB calls based on latest interface updates

* correct run_tests.py

* add dd tests

* burstiness fix

* dd tests add

* forward flow tests

* src changes

* testing timestamp variations

* src changes

* update test ds

* fix formatting

* fix formatting

---------

Co-authored-by: Xinlan Emily Hu <xehu@wharton.upenn.edu>
Co-authored-by: Xinlan Emily Hu <xehu@cs.stanford.edu>

* Amy/website (#270)

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* deployed website

* copyright and team

* team headshots and footer

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* whitespace edits

* homepage updates

* feature table

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* homepage updates

* add table of features

* updated team page titles

* include flask in requirements.txt

* updates to table of features

* load pages from top

* fix to 404 issues

* moved build under website folder

* add flask back into requirements

---------

Co-authored-by: Xinlan Emily Hu <xehu@cs.stanford.edu>
Co-authored-by: Xinlan Emily Hu <xehu@wharton.upenn.edu>

---------

Co-authored-by: sundy1994 <yuxuanzh@seas.upenn.edu>
Co-authored-by: Shruti Agarwal <46203852+agshruti12@users.noreply.github.com>
Co-authored-by: amytangzheng <145236844+amytangzheng@users.noreply.github.com>
xehu added a commit that referenced this pull request Aug 17, 2024
* valence testing

* rearranging files

* intermediate ner testing

* NER testing

* fix featurizer

* fix featurize bug

* updating test dataset + function

* code coverage

* burstiness

* move testing FB's into run_tests.py

* move NER dataframe to test file

* adding complex tests back to run_tests.py

* add chat_complex_df and conv_complex_df to run_tests.py

* correct dataset paths

* rebase

* changing references as part of rebase

* correcting FB calls based on latest interface updates

* correct run_tests.py

* add dd tests

* burstiness fix

* dd tests add

* forward flow tests

* src changes

* testing timestamp variations

* src changes

* update test ds

* fix formatting

* fix formatting

---------

Co-authored-by: Xinlan Emily Hu <xehu@wharton.upenn.edu>
Co-authored-by: Xinlan Emily Hu <xehu@cs.stanford.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants