Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nlp-spacy subject Exercise 1 lack of instructions & audit question about non-existing task #2345

Closed
jarmo-seljamaa opened this issue Dec 7, 2023 · 4 comments · Fixed by #2350 or #2355
Assignees

Comments

@jarmo-seljamaa
Copy link

jarmo-seljamaa commented Dec 7, 2023

nlp-spacy

The following instruction for Exercise 1 is insufficient:

Compute the embedding of car.

It would make sense, if we could look up keyword embedding in the provided API docs or the provided English language pipeline but there appears to be nothing about that. Thanks to the audit question I figured that we are required to print the .vector.shape for the word embeddings of the string car.

The audit of that task refers to an additional question 2 which is not present in the task description.

@nprimo nprimo self-assigned this Dec 11, 2023
@nprimo nprimo linked a pull request Dec 11, 2023 that will close this issue
@nprimo
Copy link
Contributor

nprimo commented Dec 11, 2023

Hi @jarmo-seljamaa, thank you for the feedback.

About the wording, it would be nice to do a quick search in the documentation to find out more about it. However, you can easily find out about embedding terminology in the scope of NLP with a quick Google search/Wikipedia

Related to the audit, I feel the wording of the question are a bit off - I have already made some changes to make it clearer during the audit what is required to check

@jarmo-seljamaa
Copy link
Author

@nprimo you seem to have missed my point. Of course it is easy to look up what does embedding mean, but the goal of AI piscine is to give us the skills and tools to learn to use NLP, specifically spaCy in this quest. Therefore I feel that the task instructions should include a bit more precise information on what is the expected result of such computing. In this specific case I would expect to see something like this: Compute the embedding of the word 'car' and print out the vector shape of it.

Furthermore, considering your suggested change in audit questions, an additional line is expected in the task instructions: Using the computed embeddings of the word 'car' print out it's vector shape and the sum of the first 20 values of the vector.

@jarmo-seljamaa
Copy link
Author

Ah, and by the way, I got 2.9790106 for the sum of the first 20 values, not 2.9790137708187103. Is the precise float value important here, which is why so many decimal places were given in audit?

@nprimo nprimo reopened this Dec 12, 2023
@nprimo
Copy link
Contributor

nprimo commented Dec 12, 2023

Hi @jarmo-seljamaa, thank you for the additional feedback. I do agree with the two points you raised. I will add more details to the subject and audit to make it clearer what is supposed to achieve / do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants