-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switching to the desktop layout (done) #41
Conversation
HI, Thank you for your contribution. I changed the base branch from The markdown on post contents using another npm module could be an overkill also may be a maintainability problem. But we can implement something which removes all HTML tags around the String content not necessarily converting to a markdown input. |
@kaanyagci The point of the markdown format is actually providing users a minimal output can be used to re-visualize posts content in the exact same way as the original, I only suggested that to avoid including the HTML format as it may be large and not human readable. |
After some testing, I've noticed that when you start scraping without authentication, some posts won't provide the author profile url, in this case the selector group_post_author won't work. Also, it's quite different how elements are being loaded in the desktop layout, in fact they won't until they show up on the viewport, and for that we should start scrolling before scraping. |
npm badges added
This is completely a mess! The function `getGroupPosts()` needs a full rewrite!
- A full rewrite for the scraper function. - MutationObserver implementation.
- Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content!
@kaanyagci So yeah, I did it! The scraper works perfectly now with the new desktop layout of Facebook, and it has the same functionality as the one from the master branch. I think it's time you merge this to the development branch (after reviewing and testing it of course). Other features and new fields for the GroupPost interface should be added in a separate pull request to make it easier to organize things up! |
@all-contributors please add @iMrDJAi for code |
I've put up a pull request to add @iMrDJAi! 🎉 |
@iMrDJAi This is excellent news! I was really busy with other stuff today. I'll test this first thing tomorrow! Great job! 💯 |
@kaanyagci Any updates? Have you tested it? Any issues? |
Sorry for the delay. I was still a little busy :( I'll look ASAP. |
Just checked. Sadly I can not get it to work.
import { FB } from './index';
async function main() {
const f = await FB.init({
debug: true,
output: 'test.json',
headless: false,
groupIds: ['774278349295443'],
useCookies: true,
disableAssets: true,
});
f.login('<redacted>', '<redacted>');
await f.getGroupPosts(774278349295443, 'groupOutput');
}
main().then(() => {
console.log('Done');
}); Gives the following output: /Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115
? new Error(`${response.errorText} at ${url}`)
^
Error: net::ERR_ABORTED at https://facebook.com
at navigate (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async FrameManager.navigateFrame (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
at async Frame.goto (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
at async Page.goto (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:819:16)
at async Facebook.login (/Users/kaanyagci/Documents/makepad/fbjs/dist/lib/models/fb.js:113:9) Note: The output is the same for both headless and not headless modes.
I'll try to investigate these issues as soon as possible this week |
@kaanyagci Interesting. in fact I haven't tried logging in, I been always testing in userless mode, I'll try that later and check what's going on. ;(async () => {
const { FB } = require("@makepad/fbjs")
const fb = await FB.init({
headless: true,
useCookies: false,
output: ''
})
//await fb.getGroupPosts("319144912641926", "./output.json")
await fb.getGroupPosts("319144912641926")
})() |
@kaanyagci That doesn't make sense, I'm 100% sure that I totally removed the mobile website. Fork my master branch again. |
My bad, I was trying on another branch 🤦 |
This looks great actually. For the first issue, I've added the userAgent as Facebook rejects connections from headless browsers. I'll add this line once it's merged on |
* Switching to the desktop layout (work in progress) (#41) * 📝 Funding documentation added (#40) * Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * README updated (#42) npm badges added * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> * docs: add iMrDJAi as a contributor for code (#43) * 📝 Funding documentation added (#40) * README updated (#42) npm badges added * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] * Update README.md for missing badges * Duplicated all contributors badge removed Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * feat(package): sponsor button added to the npm package * fix(browser): facebook headless browser rejection issue fixed user agents added * feat: concom configuration added Concom is a Conventional Commit formatter which is actually in private alpha release * feat(version): Version incremented to 4.1.0 Co-authored-by: ${Mr.DJA} <aoutou.d@gmail.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* Switching to the desktop layout (work in progress) (#41) * 📝 Funding documentation added (#40) * Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * README updated (#42) npm badges added * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> * docs: add iMrDJAi as a contributor for code (#43) * 📝 Funding documentation added (#40) * README updated (#42) npm badges added * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] * Update README.md for missing badges * Duplicated all contributors badge removed Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * feat(package): sponsor button added to the npm package * fix(browser): facebook headless browser rejection issue fixed user agents added * feat: concom configuration added Concom is a Conventional Commit formatter which is actually in private alpha release * feat(version): Version incremented to 4.1.0 Co-authored-by: ${Mr.DJA} <42304709+iMrDJAi@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* 📝 Funding documentation added (#40) * Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * README updated (#42) npm badges added * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com>
* README updated (#42) npm badges added * Switching to the desktop layout (work in progress) (#41) * 📝 Funding documentation added (#40) * Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * README updated (#42) npm badges added * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> * docs: add iMrDJAi as a contributor for code (#43) * 📝 Funding documentation added (#40) * README updated (#42) npm badges added * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] * Update README.md for missing badges * Duplicated all contributors badge removed Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * feat(package): sponsor button added to the npm package * fix(browser): facebook headless browser rejection issue fixed user agents added * feat: concom configuration added Concom is a Conventional Commit formatter which is actually in private alpha release * feat(version): Version incremented to 4.1.0 * feat: tsconfig build information updated * Switching to the desktop layout (work in progress) (#41) * 📝 Funding documentation added (#40) * Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * README updated (#42) npm badges added * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> * feat: tsconfig build information updated * [4.1.1] - Bug fixes (#47) * 🐛 Cookie file double extension issue fixed The issue was causing the impossibility to load cookies it is fixed by replacing the .json extension if exists by nothing * 🔧 TypeScript compiler configuration file updated examples source code is excluded * 🔇 Unnecessary console.logs removed * ✨ callback parameter added to get group posts * 🔧 last build info file added * 🔧 .npmignore file added this files contains files to ignore on npm module * 🚨 source file linted * 🔧 .eslintignore file updated example folder will not be linted * ⬆️ Dependency versions upgrade package-lock.json file updated * ✨ local file saving is now optional a parameter added to getGroupPosts function to save or not on a local file * 📝 README file updated Usage example added * 🔧 last build information added * 📝 Example app created * 🐛 group name normalisation issue fixed Output files by default will be named with group id instead of group name * 🔖 version incremented to 4.1.1 Co-authored-by: ${Mr.DJA} <aoutou.d@gmail.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes * Added new fields to the GroupPost interface! * Bug fixes + now it parse posts one by one * Small changes - Updated the group name selector. - Now it scrolls down a little bit before start scraping to ensure that posts will load. * Fixed some issues with hovering * [4.1.1] - Bug fixes (#48) * README updated (#42) npm badges added * Switching to the desktop layout (work in progress) (#41) * 📝 Funding documentation added (#40) * Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * README updated (#42) npm badges added * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> * docs: add iMrDJAi as a contributor for code (#43) * 📝 Funding documentation added (#40) * README updated (#42) npm badges added * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] * Update README.md for missing badges * Duplicated all contributors badge removed Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * feat(package): sponsor button added to the npm package * fix(browser): facebook headless browser rejection issue fixed user agents added * feat: concom configuration added Concom is a Conventional Commit formatter which is actually in private alpha release * feat(version): Version incremented to 4.1.0 * feat: tsconfig build information updated * Switching to the desktop layout (work in progress) (#41) * 📝 Funding documentation added (#40) * Added desktop layout selectors * Added "See More" button selector * Added xpath selectors + some improvements * Small fix * README updated (#42) npm badges added * Checkpoint This is completely a mess! The function `getGroupPosts()` needs a full rewrite! * Updated the scraper code - A full rewrite for the scraper function. - MutationObserver implementation. * Improvements! - Fixed a bug when posts don't have text content. - Now it clicks on the "See more..." button before extracting the post content! * Removed xpath selectors + a bunch of minor changes Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> * feat: tsconfig build information updated * [4.1.1] - Bug fixes (#47) * 🐛 Cookie file double extension issue fixed The issue was causing the impossibility to load cookies it is fixed by replacing the .json extension if exists by nothing * 🔧 TypeScript compiler configuration file updated examples source code is excluded * 🔇 Unnecessary console.logs removed * ✨ callback parameter added to get group posts * 🔧 last build info file added * 🔧 .npmignore file added this files contains files to ignore on npm module * 🚨 source file linted * 🔧 .eslintignore file updated example folder will not be linted * ⬆️ Dependency versions upgrade package-lock.json file updated * ✨ local file saving is now optional a parameter added to getGroupPosts function to save or not on a local file * 📝 README file updated Usage example added * 🔧 last build information added * 📝 Example app created * 🐛 group name normalisation issue fixed Output files by default will be named with group id instead of group name * 🔖 version incremented to 4.1.1 Co-authored-by: ${Mr.DJA} <42304709+iMrDJAi@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * Updates... - Hovering bug fix. - Renamed the GroupPost interface to.. just Post. - Added images field to the Post interface. * Minor changes * Update fb.ts Co-authored-by: Kaan Yagci <9104546+kaanyagci@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
The mobile layout of Facebook provides limited data and low quality media, because of that the project should switch entirely to the desktop layout.
Todo list:
These are all the selectors explained: