Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Text Analysis #7

Closed
UltimaBGD opened this issue Oct 30, 2019 · 8 comments
Closed

Simple Text Analysis #7

UltimaBGD opened this issue Oct 30, 2019 · 8 comments
Projects

Comments

@UltimaBGD
Copy link
Contributor

@UltimaBGD UltimaBGD commented Oct 30, 2019

As stated in the overview document, we would like to implement some form of text analysis.

Simple text analysis: Word Length, Reading Time, Writing level, etc

Would be the focus for this issue. This can most likely be developed as a stand alone feature before being integrated into the project. In essence, it can be developed to work on some text, and when the project is advanced enough to support the integration of this feature, easily integrated.

This train of thought could also work for checking the zero content posts, but that may be better to leave for another issue.

I would like to work on this, are there any ideas or ways to improve it that anyone has in mind?

@humphd

This comment has been minimized.

Copy link
Contributor

@humphd humphd commented Oct 30, 2019

I agree, let's not block this on getting the rest of the system built. We can begin building a set of discrete text analysis functions/endpoints/whatever with tests, and figure out how to plug them into the larger whole when it's further along.

What if we begin with a directory like src/analysis/* and start creating simple modules that essentially do:

module.exports = exports = async function(text) {
   // Return a Promise (analysis could take some time in more complex cases)
   // and do whatever work we need to do
   return doSomeAnalysis(text);
};

We can iterate on this design, but the most basic example (count chars) might be:

// src/analysis/char-count.js
module.exports = exports = async function(text) {
   return Promise.resolve(text.length);
};

// somewhere else in our code...
const charCountAnalysis = require('./src/analysis/char-count');
...
const text = "This is my blog post...";
try {
    const charCount = await charCountAnalysis(text);
} catch(e) {
    // deal with error from analysis step, logging, etc.
} 
@jatinAroraGit

This comment has been minimized.

Copy link
Collaborator

@jatinAroraGit jatinAroraGit commented Oct 30, 2019

Hey @UltimaBGD and @humphd ,
We can also implement npm packages that already does this and on top can provide certain characteristics of a post like sentiment, writing level and support more languages other than English.
I can look for some packages that can do this.

@humphd

This comment has been minimized.

Copy link
Contributor

@humphd humphd commented Oct 30, 2019

@jatinAroraGit exactly!

const { someExistingFunctionWeWant } = require('external-module');

// Wrap the call to some external API we use in a promise
module.exports = exports = async function(text) {
   // If the code we use is not promise based, do that, or just return the promise
   const value = someExistingFunctionWeWant(text);
   return Promise.resolve(value);
};
@jerryshueh

This comment has been minimized.

Copy link
Contributor

@jerryshueh jerryshueh commented Oct 30, 2019

If we're doing any sort of NLP at all, I recommend we use Python for this particular component. The support for data science work on Python is fantastic, with many libraries and modules we can use that require just single lines of code to run. We can export the final metrics out in a CSV file for use by the core systems, or even a JSON file if we need more semantics. Just my two cents - I've personally never done any actual text analysis work in node.js, so I can't attest to the conveniences and challenges of doing so.

@humphd

This comment has been minimized.

Copy link
Contributor

@humphd humphd commented Oct 30, 2019

@jerryshueh you can run models pretty much in any backend ML engine (tensorflow has bindings for c++, python, node, and the browser), but we can also leverage other tools as well. Our wrappers can return JSON, agreed, and then we can connect them to REST endpoints for others to consume.

Let's not worry about which tech we use at this point, and focus instead on which analysis we want to do. Let the requirements (and existing code/stack) drive our decisions.

@cyh0968

This comment has been minimized.

Copy link

@cyh0968 cyh0968 commented Nov 7, 2019

I think we can use reading-level to measure the level of writing.

@UltimaBGD

This comment has been minimized.

Copy link
Contributor Author

@UltimaBGD UltimaBGD commented Nov 7, 2019

That seems like a good plugin!

@humphd

This comment has been minimized.

Copy link
Contributor

@humphd humphd commented Nov 11, 2019

This is underway in numerous PRs now and follow-up Issues with more specific details. Closing in favour of the work happening there.

@humphd humphd closed this Nov 11, 2019
Main automation moved this from Discussion to Closed Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Main
Closed
6 participants
You can’t perform that action at this time.