Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you implement an example of an archive website such as headbook or Weibo? #11

Open
GoYiz opened this issue Dec 19, 2021 · 2 comments

Comments

@GoYiz
Copy link

GoYiz commented Dec 19, 2021

Could you implement an example of an archive website such as headbook or Weibo?
I encountered some problems in the implementation process. For example, during the implementation of Weibo API, the Save option prompts backup processing, and the post content is not displayed after refresh.
Here are some code paths and contents I changed.

  1. packages\backend\src\Utility\parsePostId\parsePostId.ts
import { getUrlLastSegment } from '../../Utility/urlParse/getLastSegment';
.......
.......
export async function parsePostId(urlStr: string): Promise<string> {
    return getUrlLastSegment(urlStr);
  1. packages/backend/src/Components/Post/Service/postApi.ts
function getPostApi(postId: string): AxiosPromise {
    return crawlerAxios.get(`https://weibo.com/ajax/statuses/longtext?id=${postId}`);
  1. packages/backend/src/Components/Post/Service/postCrawler.ts
  private transformData(res: any): {
    repostingId: string;
    postInfo: IPost;
    userRaw: unknown;
    embedImages: string[];
  } {
    /* extract the information here. If it's a html document, 
    try to manipulate the html with cheerio */
    return res.postInfo;

Running platform : Gitpod

@Combo819
Copy link
Owner

Hi @GoYiz , This is the instance of headbook https://github.com/Combo819/headbook-archiver

The transformData is obviously incomplete. It should return an object of this type

{
    repostingId: string;
    postInfo: IPost;
    userRaw: unknown;
    embedImages: string[];
  }

but you just return res.postInfo.
You should make some transformation of the res object, and convert it into the type IPost. You can check the example here https://github.com/Combo819/headbook-archiver/blob/master/packages/backend/src/Components/Post/Service/postCrawler.ts#L162
The res from Weibo may have very different structure. So you need to write your own transformation based on the res structure.
BTW, if you're trying to implement a weibo instance, m.weibo.cn is easier than weibo.com.

@Combo819
Copy link
Owner

You can also assign https://weibo.com to BASE_URL in https://github.com/Combo819/headbook-archiver/blob/master/packages/backend/src/Config/constants.ts#L1
So you can write the API function like

 return crawlerAxios.get(`/ajax/statuses/longtext?id=${postId}`);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants